Commit Graph

2824 Commits

Author SHA1 Message Date
Clint Wylie 5baa22148e
revert ColumnAnalysis type, add typeSignature and use it for DruidSchema (#11895)
* revert ColumnAnalysis type, add typeSignature and use it for DruidSchema

* review stuffs

* maybe null

* better maybe null

* Update docs/querying/segmentmetadataquery.md

* Update docs/querying/segmentmetadataquery.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* fix null right

* sad

* oops

* Update batch_hadoop_queries.json

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-11-10 18:46:29 -08:00
Maytas Monsereenusorn a36a41da73
Support routing data through an HTTP proxy (#11891)
* Support routing data through an HTTP proxy

* Support routing data through an HTTP proxy

This adds the ability for the HttpClient to connect through an HTTP proxy.  We
augment the channel factory to check if it is supposed to be proxied and, if so,
we connect to the proxy host first, issue a CONNECT command through to the final
recipient host and *then* give the channel to the normal http client for usage.

* add docs

* address comments

Co-authored-by: imply-cheddar <86940447+imply-cheddar@users.noreply.github.com>
2021-11-09 17:24:06 -08:00
Maytas Monsereenusorn ddc68c6a81
Support changing dimension schema in Auto Compaction (#11874)
* add impl

* add unit tests

* fix checkstyle

* add impl

* add impl

* add impl

* add impl

* add impl

* add impl

* fix test

* add IT

* add IT

* fix docs

* add test

* address comments

* fix conflict
2021-11-08 21:17:08 -08:00
Jian Wang 8e7e679984
Add more metrics for Jetty server thread pool usage (#11113)
Add more metrics for jetty server thread pool usage so we know if we have allocated enough http threads to handle requests.
2021-11-07 16:51:44 +05:30
zachjsh 1d6df48145
Warn if cache size of lookup is beyond max size (#11863)
Enhanced the ExtractionNamespace interface in lookups-cached-global core extension with the ability to set a maxHeapPercentage for the cache of the respective namespace. The reason for adding this functionality, is make it easier to detect when a lookup table grows to a size that the underlying service cannot handle, because it does not have enough memory. The default value of maxHeap for the interface is -1, which indicates that no maxHeapPercentage has been set. For the JdbcExtractionNamespace and UriExtractionNamespace implementations, the default value is null, which will cause the respective service that the lookup is loaded in, to warn when its cache is beyond mxHeapPercentage of the service's configured max heap size. If a positive non-null value is set for the namespace's maxHeapPercentage config, this value will be honored for all services that the respective lookup is loaded onto, and consequently log warning messages when the cache of the respective lookup grows beyond this respective percentage of the services configured max heap size. Warnings are logged every time that either Uri based or Jdbc based lookups are regenerated, if the maxHeapPercentage constraint is violated. No other implementations will log warnings at this time. No error is thrown when the size exceeds the maxHeapPercentage at this time, as doing so could break functionality for existing users. Previously the JdbcCacheGenerator generated its cache by materializing all rows of the underling table in memory at once; this made it difficult to log warning messages in the case that the results from the jdbc query were very large and caused the service to run out of memory. To help with this, this pr makes it so that the jdbc query results are instead streamed through an iterator.
2021-11-03 21:32:22 -04:00
Kashif Faraz a22687ecbe
Add Broker config `druid.broker.segment.watchRealtimeNodes` (#11732)
The new config is an extension of the concept of "watchedTiers" where
the Broker can choose to add the info of only the specified tiers to its timeline.
Similarly, with this config, Broker can choose to skip the realtime nodes and
thus it would query only Historical processes for any given segment.
2021-11-02 12:38:42 +05:30
Katya Macedo 5e1dc843d1
Fix quickstart link (#11864) 2021-11-02 13:27:53 +08:00
Maytas Monsereenusorn ba2874ee1f
Support changing query granularity in Auto Compaction (#11856)
* add queryGranularity

* fix checkstyle

* fix test
2021-11-01 15:18:44 -07:00
Karan Kumar 90640bb316
Support for hadoop 3 via maven profiles (#11794)
Add support for hadoop 3 profiles . Most of the details are captured in #11791 .
We use a combination of maven profiles and resource filtering to achieve this. Hadoop2 is supported by default and a new maven profile with the name hadoop3 is created. This will allow the user to choose the profile which is best suited for the use case.
2021-10-30 22:46:24 +05:30
Maytas Monsereenusorn 33d9d9bd74
Add rollup config to auto and manual compaction (#11850)
* add rollup to auto and manual compaction

* add unit tests

* add unit tests

* add IT

* fix checkstyle
2021-10-29 10:22:25 -07:00
Gian Merlino fc95c92806
Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs. (#11124)
* Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs.

This patch does the following:

- Removes OffheapIncrementalIndex.
- Clarifies that Aggregators are required to be thread safe.
- Clarifies that BufferAggregators and VectorAggregators are not
  required to be thread safe.
- Removes thread safety code from some DataSketches aggregators that
  had it. (Not all of them did, and that's OK, because it wasn't necessary
  anyway.)
- Makes enabling "useOffheap" with groupBy v1 an error.

Rationale for removing the offheap incremental index:

- It is only used in one rare scenario: groupBy v1 (which is non-default)
  in "useOffheap" mode (also non-default). So you have to go pretty deep
  into the wilderness to get this code to activate in production. It is
  never used during ingestion.
- Its existence complicates developer efforts to reason about how
  aggregators get used, because the way it uses buffer aggregators is so
  different from how every other query engine uses them.
- It doesn't have meaningful testing.

By the way, I do believe that the given way the offheap incremental index
works, it actually didn't require buffer aggregators to be thread-safe.
It synchronizes on "aggregate" and doesn't call "get" until it has
stopped calling "aggregate". Nevertheless, this is a bother to think about,
and for the above reasons I think it makes sense to remove the code anyway.

* Remove things that are now unused.

* Revert removal of getFloat, getLong, getDouble from BufferAggregator.

* OAK-related warnings, suppressions.

* Unused item suppressions.
2021-10-26 08:05:56 -07:00
Sergio Ferragut 000a5551fa
docker mem reqs (#11827)
* docker mem reqs

* Update docs/tutorials/docker.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

Co-authored-by: Sergio Ferragut <sergio.ferragut@imply.io>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-10-25 12:23:25 -07:00
Gian Merlino 8276c031c5
Add druid.sql.approxCountDistinct.function property. (#11181)
* Add druid.sql.approxCountDistinct.function property.

The new property allows admins to configure the implementation for
APPROX_COUNT_DISTINCT and COUNT(DISTINCT expr) in approximate mode.

The motivation for adding this setting is to enable site admins to
switch the default HLL implementation to DataSketches.

For example, an admin can set:

  druid.sql.approxCountDistinct.function = APPROX_COUNT_DISTINCT_DS_HLL

* Fixes

* Fix tests.

* Remove erroneous cannotVectorize.

* Remove unused import.

* Remove unused test imports.
2021-10-25 12:16:21 -07:00
Kashif Faraz abac9e39ed
Revert permission changes to Supervisor and Task APIs (#11819)
* Revert "Require Datasource WRITE authorization for Supervisor and Task access (#11718)"

This reverts commit f2d6100124.

* Revert "Require DATASOURCE WRITE access in SupervisorResourceFilter and TaskResourceFilter (#11680)"

This reverts commit 6779c4652d.

* Fix docs for the reverted commits

* Fix and restore deleted tests

* Fix and restore SystemSchemaTest
2021-10-25 14:50:38 +05:30
Charles Smith 10c5fa93f1
remove dupe sentence (#11821) 2021-10-25 14:48:20 +05:30
Victoria Lim 43103632fb
Docs - add description on time origin (#11826)
* add description on time origin

* reorder parameter descriptions

* add example of origin value
2021-10-22 14:57:13 -07:00
Charles Smith 938c1493e5
edits to kafka inputFormat (#11796)
* edits to kafka inputFormat

* revise conflict resolution description

* tweak for clarity

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* style fixes

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
2021-10-15 14:01:10 -07:00
Charles Smith 6089a168ea
Docs - update dynamic config provider topic (#11795)
* update dynamic config provider

* update topic

* add examples for dynamic config provider:

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/operations/dynamic-config-provider.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/operations/dynamic-config-provider.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/operations/dynamic-config-provider.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/operations/dynamic-config-provider.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/operations/dynamic-config-provider.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* Update docs/operations/dynamic-config-provider.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* Update kafka-ingestion.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2021-10-14 17:51:32 -07:00
Arun Ramani b6b42d3936
Minor processor quota computation fix + docs (#11783)
* cpu/cpuset cgroup and procfs data gathering

* Renames and default values

* Formatting

* Trigger Build

* Add cgroup monitors

* Return 0 if no period

* Update

* Minor processor quota computation fix + docs

* Address comments

* Address comments

* Fix spellcheck

Co-authored-by: arunramani-imply <84351090+arunramani-imply@users.noreply.github.com>
2021-10-08 22:52:03 -05:00
Victoria Lim 42e44269be
Docs update for druid-basic-security (#11782)
* update druid-basic-security

* typo

* revisions from review
2021-10-08 14:45:09 -07:00
Kashif Faraz c2c724c065
Fix docs to explain that WRITE permissions do not include READ (#11785)
* Fix docs to explain that WRITE and READ are exclusive

* Fix indentation

* Use suggested doc style
2021-10-08 14:10:20 -07:00
Charles Smith 3ecbd3aec4
docs for changes to authorization in #11718 and #11720 (#11779)
* security recommendation

* Update docs/operations/security-overview.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/operations/security-user-auth.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/operations/security-user-auth.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update security-user-auth.md

add newline

* Update docs/operations/security-overview.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update security-overview.md

add suggestion for environment variable dynamic config provider

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Clint Wylie <cwylie@apache.org>
2021-10-08 14:04:04 -07:00
Kashif Faraz f2d6100124
Require Datasource WRITE authorization for Supervisor and Task access (#11718)
Follow up PR for #11680

Description
Supervisor and Task APIs are related to ingestion and must always require Datasource WRITE
authorization even if they are purely informative.

Changes
Check Datasource WRITE in SystemSchema for tables "supervisors" and "tasks"
Check Datasource WRITE for APIs /supervisor/history and /supervisor/{id}/history
Check Datasource for all Indexing Task APIs
2021-10-08 10:39:48 +05:30
Katya Macedo 45d0ecbefb
clarify hadoop input paths (#11781)
Co-authored-by: Katya Macedo <katya.macedo@imply.io>
2021-10-07 20:22:51 -07:00
lokesh-lingarajan ad6609a606
Kafka Input Format for headers, key and payload parsing (#11630)
### Description

Today we ingest a number of high cardinality metrics into Druid across dimensions. These metrics are rolled up on a per minute basis, and are very useful when looking at metrics on a partition or client basis. Events is another class of data that provides useful information about a particular incident/scenario inside a Kafka cluster. Events themselves are carried inside kafka payload, but nonetheless there are some very useful metadata that is carried in kafka headers that can serve as useful dimension for aggregation and in turn bringing better insights.

PR(https://github.com/apache/druid/pull/10730) introduced support of Kafka headers in InputFormats.

We still need an input format to parse out the headers and translate those into relevant columns in Druid. Until that’s implemented, none of the information available in the Kafka message headers would be exposed. So first there is a need to write an input format that can parse headers in any given format(provided we support the format) like we parse payloads today. Apart from headers there is also some useful information present in the key portion of the kafka record. We also need a way to expose the data present in the key as druid columns. We need a generic way to express at configuration time what attributes from headers, key and payload need to be ingested into druid. We need to keep the design generic enough so that users can specify different parsers for headers, key and payload.

This PR is designed to solve the above by providing wrapper around any existing input formats and merging the data into a single unified Druid row.

Lets look at a sample input format from the above discussion

"inputFormat":
{
    "type": "kafka",     // New input format type
    "headerLabelPrefix": "kafka.header.",   // Label prefix for header columns, this will avoid collusions while merging columns
    "recordTimestampLabelPrefix": "kafka.",  // Kafka record's timestamp is made available in case payload does not carry timestamp
    "headerFormat":  // Header parser specifying that values are of type string
    {
        "type": "string"
    },
    "valueFormat": // Value parser from json parsing
    {
        "type": "json",
        "flattenSpec": {
          "useFieldDiscovery": true,
          "fields": [...]
        }
    },
    "keyFormat":  // Key parser also from json parsing
    {
        "type": "json"
    }
}

Since we have independent sections for header, key and payload, it will enable parsing each section with its own parser, eg., headers coming in as string and payload as json. 

KafkaInputFormat will be the uber class extending inputFormat interface and will be responsible for creating individual parsers for header, key and payload, blend the data resolving conflicts in columns and generating a single unified InputRow for Druid ingestion. 

"headerFormat" will allow users to plug parser type for the header values and will add default header prefix as "kafka.header."(can be overridden) for attributes to avoid collision while merging attributes with payload.

Kafka payload parser will be responsible for parsing the Value portion of the Kafka record. This is where most of the data will come from and we should be able to plugin existing parser. One thing to note here is that if batching is performed, then the code is augmenting header and key values to every record in the batch.

Kafka key parser will handle parsing Key portion of the Kafka record and will ingest the Key with dimension name as "kafka.key".

## KafkaInputFormat Class: 
This is the class that orchestrates sending the consumerRecord to each parser, retrieve rows, merge the columns into one final row for Druid consumption. KafkaInputformat should make sure to release the resources that gets allocated as a part of reader in CloseableIterator<InputRow> during normal and exception cases.

During conflicts in dimension/metrics names, the code will prefer dimension names from payload and ignore the dimension either from headers/key. This is done so that existing input formats can be easily migrated to this new format without worrying about losing information.
2021-10-07 08:56:27 -07:00
Charles Smith 8fd17fe0af
fix a few typos in Kinesis doc (#11776) 2021-10-06 19:43:20 -07:00
Lucas Capistrant 1930ad1f47
Implement configurable internally generated query context (#11429)
* Add the ability to add a context to internally generated druid broker queries

* fix docs

* changes after first CI failure

* cleanup after merge with master

* change default to empty map and improve unit tests

* add doc info and fix checkstyle

* refactor DruidSchema#runSegmentMetadataQuery and add a unit test
2021-10-06 09:02:41 -07:00
Kashif Faraz b688db790b
Add Broker config `druid.broker.segment.ignoredTiers` (#11766)
The new config is an extension of the concept of "watchedTiers" where
the Broker can choose to add the info of only the specified tiers to its timeline.
Similarly, with this config, Broker can choose to ignore the segments being served
by the specified historical tiers. By default, no tier is ignored.

This config is useful when you want a completely isolated tier amongst many other tiers.

Say there are several tiers of historicals Tier T1, Tier T2 ... Tier Tn
and there are several brokers Broker B1, Broker B2 .... Broker Bm

If we want only Broker B1 to query Tier T1, instead of setting a long list of watchedTiers
on each of the other Brokers B2 ... Bm, we could just set druid.broker.segment.ignoredTiers=["T1"]
for these Brokers, while Broker B1 could have druid.broker.segment.watchedTiers=["T1"]
2021-10-06 10:06:32 +05:30
Frank Chen 104c9a07f0
Fix broken anchor and heading levels in Kafka/Kinesis ingestion (#11748)
* Fix broken anchor and heading levels

* Fix CI
2021-10-05 19:30:50 -07:00
Charles Smith 621e5ac63f
docs: clarify RealtimeMetricsMonitor, HistoricalMetricsMonitor (#11565)
* docs: clarify RealtimeMetricsMonitor, HistoricalMetricsMonitor

* Update docs/configuration/index.md
2021-10-05 17:38:23 -07:00
Maytas Monsereenusorn f60b3b3bab
fix doc (#11772) 2021-10-05 15:42:11 -07:00
Victoria Lim a31d99fb37
update docs with X-Druid-SQL-Query-Id (#11761)
* update docs with X-Druid-SQL-Query-Id

* review comments

* update header description

* Update docs/querying/sql.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/querying/sql.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-10-06 00:15:05 +07:00
Caroline1000 ffbe303828
Update balancer strategy recommendations (#11759)
* Update balancer strategy recommendations

* Update docs/configuration/index.md

* Update docs/configuration/index.md

Co-authored-by: Suneet Saldanha <suneet@apache.org>
2021-10-05 09:47:37 -07:00
Vaibhav 3c4bba1478
Update kinesis-ingestion.md (#11767)
* Update kinesis-ingestion.md

It seems that we are declaring (a final int) recordsPerFetch as 400 and fetchDelayMillis as 0 in https://github.com/implydata/druid/blob/imply-2021.09/extensions-core/kinesis-indexing-service/src/main/java/org/apache/druid/indexing/kinesis/KinesisIndexTaskIOConfig.java#L36

```
public static final int DEFAULT_RECORDS_PER_FETCH = 4000;
public static final int DEFAULT_FETCH_DELAY_MILLIS = 0;
```

updating `recordsPerFetch` and `fetchDelayMillis` to actual default values as hardcoded above .

* Update docs/development/extensions-core/kinesis-ingestion.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-10-04 11:26:53 -07:00
sthetland d02d2d9d56
Design/architecture doc touchups (#11762)
* rearrange design content

* casing consistency

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-10-04 11:09:35 -07:00
Maytas Monsereenusorn 129911a20e
Add documentations for config to filter internal Druid-related messages from error response (#11755)
* add doc

* add doc

* address comments

* fix typo

* address comments
2021-10-01 17:49:02 +07:00
Kashif Faraz c641657bae
Fix router documentation for `druid.router.sql.enable` (#11716)
* Rename field, fix router documentation

* Add more lines to doc

* Apply doc suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-09-28 22:54:13 +05:30
Clint Wylie 5de26cf6d9
add optional system schema authorization (#11720)
* add optional system schema authorization

* remove unused

* adjust docs

* doc fixes, missing ldap config change for integration tests

* style
2021-09-21 13:28:26 -07:00
Lucas Capistrant 5c3f3da146
Add handoff wait time to IngestionStatsAndErrorsTaskReportData (#11090)
* Add handoff wait time to ingestion stats report. Refactor some code for batch handoff

* fix checkstyle

* Add assertion to AbstractITBatchIndexTask to make sure report reflects wait for segments happened

* add docs to the task reports section of doc
2021-09-20 22:48:44 -07:00
Peter Marshall abd19a8896
Docs - SYS query examples (#11673)
* Update sql.md

Added two example queries and adjusted formatting of one that was already there

* Update docs/querying/sql.md

Co-authored-by: Frank Chen <frankchen@apache.org>

* Update docs/querying/sql.md

Co-authored-by: Frank Chen <frankchen@apache.org>

* Update docs/querying/sql.md

Co-authored-by: Frank Chen <frankchen@apache.org>

* Update docs/querying/sql.md

Co-authored-by: Frank Chen <frankchen@apache.org>

* Update sql.md

Co-authored-by: Frank Chen <frankchen@apache.org>
2021-09-17 08:27:34 -07:00
Clint Wylie 5e092ccb9b
add MV_FILTER_ONLY, MV_FILTER_NONE, ListFilteredVirtualColumn (#11650)
* add MV_FILTER_ONLY SQL function, and list filter virtual column

* MV_FILTER_NONE and more tests

* formatting

* o yeah, forgot can do easy thing

* style

* hmm why was that there

* test filtering on virtual column

* style

* meh

* do it right

* good bot
2021-09-16 09:31:53 -07:00
Charles Smith 1ae1bbfc4f
docs: delete / cancel query (#11708)
* draft delete query

* Update docs/querying/sql.md

Co-authored-by: Jihoon Son <jihoonson@apache.org>

* Update docs/querying/sql.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/querying/sql.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* address comments

* Update docs/querying/sql.md

Co-authored-by: Jihoon Son <jihoonson@apache.org>

* Update docs/querying/sql.md

Co-authored-by: Jihoon Son <jihoonson@apache.org>

* Update sql.md

fix port for router

* Update sql.md

remove authorization until it is 403

* Update sql.md

add 403 message

Co-authored-by: Jihoon Son <jihoonson@apache.org>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2021-09-15 20:26:04 -07:00
Peter Marshall ee009ec18e
Docs - ingestion task log config and process (#11678)
* Update index.md

Moved H4s underneath the H3 for the task log location and added hyperlinks.

* Update tasks.md

Added process information around log file generation, and subsumed text from the configuration guide into this explanatory text instead.

* Update tasks.md

.html > .md

* Update docs/ingestion/tasks.md

Co-authored-by: Frank Chen <frankchen@apache.org>

Co-authored-by: Frank Chen <frankchen@apache.org>
2021-09-13 15:49:09 -07:00
Charles Smith f9329fbf9e
add clarification for maxSubqueryRows (#11687)
* add clarification for maxSubqueryRows
2021-09-13 11:49:30 -07:00
Suneet Saldanha 531d11abaf
Update description of batchProcessingMode (#11686)
* Update description of batchProcessingMode 

Update the description to explicitly mention a released version of Druid that the original version was referencing

* Update docs/configuration/index.md

* Update docs/configuration/index.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-09-10 16:55:48 -07:00
Peter Marshall f16cd2a815
Docs - granularities link back to segmentGranularity (#11672)
* Update granularities.md

Link-back to the ingestion spec as well as Native queries plus examples.

* Update docs/querying/granularities.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/querying/granularities.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-09-10 10:40:11 -07:00
Agustin Gonzalez 9efa6cc9c8
Make persists concurrent with adding rows in batch ingestion (#11536)
* Make persists concurrent with ingestion

* Remove semaphore but keep concurrent persists (with add) and add push in the backround as well

* Go back to documented default persists (zero)

* Move to debug

* Remove unnecessary Atomics

* Comments on synchronization (or not) for sinks & sinkMetadata

* Some cleanup for unit tests but they still need further work

* Shutdown & wait for persists and push on close

* Provide support for three existing batch appenderators using batchProcessingMode flag

* Fix reference to wrong appenderator

* Fix doc typos

* Add BatchAppenderators class test coverage

* Add log message to batchProcessingMode final value, fix typo in enum name

* Another typo and minor fix to log message

* LEGACY->OPEN_SEGMENTS, Edit docs

* Minor update legacy->open segments log message

* More code comments, mostly small adjustments to naming etc

* fix spelling

* Exclude BtachAppenderators from Jacoco since it is fully tested but Jacoco still refuses to ack coverage

* Coverage for Appenderators & BatchAppenderators, name change of a method that was still using "legacy" rather than "openSegments"

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2021-09-08 13:31:52 -07:00
Jihoon Son 7e90d00cc0
Configurable maxStreamLength for doubles sketches (#11574)
* Configurable maxStreamLength for doubles sketches

* fix equals/hashcode and it test failure

* fix test

* fix it test

* benchmark

* doc

* grouping key

* fix comment

* dependency check

* Update docs/development/extensions-core/datasketches-quantiles.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/querying/sql.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/querying/sql.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/querying/sql.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/querying/sql.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/querying/sql.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/querying/sql.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/querying/sql.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-08-31 14:56:37 -07:00
zhangyue19921010 6d14ea2d14
Dynamic auto scale Kinesis-Stream ingest tasks (#10985)
* ready to test

* revert misc.xml

* document kinesis md

* Update docs/development/extensions-core/kafka-ingestion.md

* Update docs/development/extensions-core/kinesis-ingestion.md

* Update docs/development/extensions-core/kinesis-ingestion.md

* Update docs/development/extensions-core/kinesis-ingestion.md

* Update docs/development/extensions-core/kinesis-ingestion.md

* Update docs/development/extensions-core/kinesis-ingestion.md

* Update docs/development/extensions-core/kinesis-ingestion.md

* Update docs/development/extensions-core/kinesis-ingestion.md

* Update docs/development/extensions-core/kinesis-ingestion.md

* Update docs/development/extensions-core/kinesis-ingestion.md

* Update docs/development/extensions-core/kinesis-ingestion.md

* Update kafka-ingestion.md

remove leading `

* Update kinesis-ingestion.md

add missing `

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-08-30 15:44:29 -07:00
Peter Marshall e1d80d05a2
Docs - note when partitioning using concatenated dimensions (#11506)
LGTM

* Update native-batch.md

Knowledge from https://the-asf.slack.com/archives/CJ8D1JTB8/p1595434977062400

* Update native-batch.md

* Fixed broken link + some grammar

* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

* Update native-batch.md

Some grammatical wizardry.

* Update native-batch.md

* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/ingestion/native-batch.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Apply suggestions from code review

remove orphaned links

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-08-30 11:59:24 -07:00
Gian Merlino ec6c6e2d53
Docs: Clarify segmentMetadata cardinality, minmax, and size behavior. (#11549)
* Docs: Clarify segmentMetadata cardinality, minmax, and size behavior.

* Further clarifications.

* Update docs/querying/segmentmetadataquery.md

style update

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-08-26 15:39:40 -07:00
Charles Smith 9032a0b079
updates Kafka and Kinesis to use . Fixes some typos and other style i… (#11624)
* updates Kafka and Kinesis to use . Fixes some typos and other style issues for Kafka.

* fix spelling

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Jihoon Son <jihoonson@apache.org>

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Jihoon Son <jihoonson@apache.org>

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Jihoon Son <jihoonson@apache.org>

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Jihoon Son <jihoonson@apache.org>

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Jihoon Son <jihoonson@apache.org>

* Update docs/development/extensions-core/kinesis-ingestion.md

Co-authored-by: Jihoon Son <jihoonson@apache.org>

* Update docs/development/extensions-core/kinesis-ingestion.md

Co-authored-by: Jihoon Son <jihoonson@apache.org>

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Jihoon Son <jihoonson@apache.org>

* address comments

Co-authored-by: Jihoon Son <jihoonson@apache.org>
2021-08-26 13:22:30 -07:00
Paul Rogers 1d5438ae7c
Add details to the Docker tutorial (#11463)
* Add details to the Docker tutorial

Added links, explanations and other details to the Docker
tutorial to make it easier for first-time users.

* Fix spelling error

And add "Jupyter" to the spelling dictionary.

* Update docs/tutorials/docker.md

* Update docs/tutorials/docker.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* Update docs/tutorials/docker.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* Update docs/tutorials/docker.md

* Update docs/tutorials/docker.md

Co-authored-by: sthetland <steve.hetland@imply.io>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: sthetland <steve.hetland@imply.io>
2021-08-24 08:49:29 -07:00
Jeet Patel adb2f5c884
Add prometheus-emitter docs (#11618)
* Add prometheus-emitter docs

* Update docs/development/extensions.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-08-24 08:48:03 -07:00
Daegi Kim 59e560e24d
fix for numShards description (#11611)
Co-authored-by: devin-kim <devin.kim@kakaocorp.com>
2021-08-23 14:05:03 -07:00
Charles Smith 66964a261b
fixes syntax for TRIM (#11619)
* fixes syntax for TRIM

* trim erroneous quote

* fix typo
2021-08-23 11:44:19 -07:00
Clint Wylie ec334a641b
MySQL extension with MariaDB connector docs (#11608)
* add docs for mariadb support via mysql extensions

* add logging so you know what druid knows

* homogenize

* spelling

* missed a couple
2021-08-19 01:52:26 -07:00
Maytas Monsereenusorn ce4dd48bb8
Support custom coordinator duties (#11601)
* impl

* fix checkstyle

* fix checkstyle

* fix checkstyle

* add test

* add test

* add test

* add integration tests

* add integration tests

* add more docs

* address comments

* address comments

* address comments

* add test

* fix checkstyle

* fix test
2021-08-19 11:54:11 +07:00
Charles Smith 91cd573472
fixes web console introduction and addresses linking issues (#11609)
* fixes web console introduction and addresses  linking issues

* fix merge conflict
2021-08-18 08:37:05 -07:00
Arvin.Z 504e54402b
update default compression format for bitmap (#11610)
Co-authored-by: azheng <azheng@adobe.com>
2021-08-18 14:54:27 +05:30
Karan Kumar d1bad92880
Made the instructions of adding extra resources as part of extensions simpler (#11577) 2021-08-17 17:33:55 +05:30
imply-jhan 332e68edb5
improve the metric definition (#11602) 2021-08-17 12:31:42 +07:00
Gian Merlino 4e5f9cdacf
Add pushes to DataSketches in SQL docs. (#11578)
* Add pushes to DataSketches in SQL docs.

These notices were already in the native docs, but they were missing
from the SQL docs.

* Grammar fix.
2021-08-16 10:38:56 -07:00
Peter Marshall 8aaefb91e3
Docs - MiddleManager Affinity "strong" definition (#11480)
* Affinity "strong" definition

Reworded "strong" to emphasise meaning and consequences - OTBO https://the-asf.slack.com/archives/CJ8D1JTB8/p1609558156092800

* Spelling corrections

* Update docs/configuration/index.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/configuration/index.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-08-13 19:17:16 -07:00
sthetland 95c5bc3a6d
Clarify when changes to credentialIterations take effect (#11590)
This change updates doc to clarify when and how a change to druid.auth.authenticator.basic.credentialIterations takes effect: changes apply only to new users or existing users upon changing their password via the credentials API, which may not be the expectation.
2021-08-13 17:02:07 -07:00
Parag Jain c7b46671b3
option to use deep storage for storing shuffle data (#11507)
Fixes #11297.
Description

Description and design in the proposal #11297
Key changed/added classes in this PR

    *DataSegmentPusher
    *ShuffleClient
    *PartitionStat
    *PartitionLocation
    *IntermediaryDataManager
2021-08-13 16:40:25 -04:00
frank chen e40be0ae28
Add SQL functions to format numbers into human readable format (#10635)
* add binary_byte_format/decimal_byte_format/decimal_format

* clean code

* fix doc

* fix review comments

* add spelling check rules

* remove extra param

* improve type handling and null handling

* remove extra zeros

* fix tests and add space between unit suffix and number as most size-format functions do

* fix tests

* add examples

* change function names according to review comments

* fix merge

Signed-off-by: frank chen <frank.chen021@outlook.com>

* no need to configure NullHandling explicitly for tests

Signed-off-by: frank chen <frank.chen021@outlook.com>

* fix tests in SQL-Compatible mode

Signed-off-by: frank chen <frank.chen021@outlook.com>

* Resolve review comments

* Update SQL test case to check null handling

* Fix intellij inspections

* Add more examples

* Fix example
2021-08-13 10:27:49 -07:00
Charles Smith 6524d838d7
Docs refactor of ingestion. Carries #11541 (#11576)
* Docs refactor of ingestion. Carries #11541

* Update docs/misc/math-expr.md

* add Apache license

* fix header, add topics to sidebar

* Update docs/ingestion/partitioning.md

* pick up changes to  and  md from c7fdf1d, #11479

Co-authored-by: Suneet Saldanha <suneet@apache.org>
Co-authored-by: Jihoon Son <jihoonson@apache.org>
2021-08-13 08:42:03 -07:00
Kashif Faraz aaf0aaad8f
Enable routing of SQL queries at Router (#11566)
This PR adds a new property druid.router.sql.enable which allows the
Router to handle SQL queries when set to true.

This change does not affect Avatica JDBC requests and they are still routed
by hashing the Connection ID.

To allow parsing of the request object as a SqlQuery (contained in module druid-sql),
some classes have been moved from druid-server to druid-services with
the same package name.
2021-08-13 18:44:39 +05:30
Gian Merlino faebefecae
Docs: add pointers from api-reference to sql docs. (#11548) 2021-08-11 09:00:33 -07:00
Suneet Saldanha 640f63094a
fix little typo (#11573)
* fix little typo

* Update docs/misc/math-expr.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-08-10 21:43:01 -07:00
Clint Wylie 9af7ba9d2a
STRING_AGG SQL aggregator function (#11241)
* add string_agg

* oops

* style and fix test

* spelling

* fixup

* review stuffs
2021-08-10 13:47:09 -07:00
benkrug bef6f43e3d
Update math-expr.md (#11254)
* Update math-expr.md
2021-08-09 17:46:05 -07:00
frank chen bf5d829b71
Add more guidelines on the use of aliyun-oss-extensions (#11420)
* Add more description

Signed-off-by: frank chen <frank.chen021@outlook.com>

* Update prefixes usage and Add troubleshooting section

* Add endpoint configuration recommendation

* Fix link

* resolve review comments
2021-08-09 17:27:35 -07:00
Charles Smith 941c5ffb05
clarify JVM tmp dir requires execute on files (#11542)
* clarify JVM tmp dir requires execute on files

* code SysMonitor for spellcheck
2021-08-09 17:25:10 -07:00
Paul Rogers 3e7cba738f
Minor edits to architecture page to improve flow (#11465)
* Minor edits to architecture page to improve flow

* Fixed spelling issue
2021-08-09 07:48:29 -07:00
Yi Yuan 59c8430d29
change document (#11545)
Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-08-06 07:57:12 -07:00
Peter Marshall 60e3955adb
Docs - clarify datasource API sources (#11489)
* Update api-reference.md

Added note OTBO Druid slack

* Update api-reference.md

Changed to an alternative explanation

* Update api-reference.md

Oops fixed.

* Update docs/operations/api-reference.md

Co-authored-by: Suneet Saldanha <suneet@apache.org>

* Update docs/operations/api-reference.md

Co-authored-by: Suneet Saldanha <suneet@apache.org>

Co-authored-by: Suneet Saldanha <suneet@apache.org>
2021-08-05 11:29:33 -07:00
Suneet Saldanha e423e99997
Update default maxSegmentsInNodeLoadingQueue (#11540)
* Update default maxSegmentsInNodeLoadingQueue

Update the default maxSegmentsInNodeLoadingQueue from 0 (unbounded) to 100.

An unbounded maxSegmentsInNodeLoadingQueue can cause cluster instability.
Since this is the default druid operators need to run into this instability
and then look through the docs to see that the recommended value for a large
cluster is 1000. This change makes it so the default will prevent clusters
from falling over as they grow over time.

* update tests

* codestyle
2021-08-05 11:26:58 -07:00
Maytas Monsereenusorn 3257913737
Improve query error logging (#11519)
* Improve query error logging

* add docs

* address comments

* address comments
2021-08-05 22:51:09 +07:00
Yi Yuan 23d7d71ea5
Add Environment Variable DynamicConfigProvider (#11377)
* add_environment_variable_DynamicConfigProvider

* fix code

* code fixed

* code fixed

* add document

* fix doc

* fix doc

* add more unit test

* fix style

* fix document

* bug fixed

* fix unit test

* fix comment

* fix test

Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-08-04 20:26:58 -07:00
Yi Yuan aa7cb50f24
Add DynamicConfigProvider for Schema Registry (#11362)
* add_DynamicConfigProvider_for_schema_registry

* bug fixed

* add document

* fix document

* fix spot bug

* fix document

* inject ObjectMapper

* add DynamicConfigProviderUtils

* add UT

* bug fixed

Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-08-03 13:24:52 -07:00
frank chen 55a01a030a
Clarify that Broker caching for groupBy v2 queries does not work (#11370)
* Add a note

* Update docs/configuration/index.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* clarify that both of non-result level cache and result level cache are not supported

Co-authored-by: sthetland <steve.hetland@imply.io>
2021-08-03 10:01:15 -07:00
Yi Yuan f1e52ab356
add doc (#11531)
Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-08-03 12:20:29 +08:00
Victoria Lim 949484728f
docs fix for doubleMean description (#11513)
* fix for doubleMean description

* include quantile aggregator description from Suneet

* update hyperlink to quantiles aggregator
2021-07-30 12:39:44 -07:00
Harini Rajendran 995d99d9e4
add ingest/notices/queueSize metric to give visibility into supervisor notices queue size (#11417) 2021-07-30 07:59:26 -07:00
Yuanli Han b83742179a
Reduce method invocation of reservoir sampling (#11257)
* reduce method invocation of reservoir sampling

* add a dynamic parameter and add benchmark

* rebase
2021-07-30 22:09:50 +08:00
Jonathan Wei 9b250c54aa
Allow kill task to mark segments as unused (#11501)
* Allow kill task to mark segments as unused

* Add IndexerSQLMetadataStorageCoordinator test

* Update docs/ingestion/data-management.md

Co-authored-by: Jihoon Son <jihoonson@apache.org>

* Add warning to kill task doc

Co-authored-by: Jihoon Son <jihoonson@apache.org>
2021-07-29 10:48:43 -05:00
Peter Marshall 0de1837ff7
Docs - partitioning note re: skew / dim concatenation + nav update (#11488)
* Update native-batch.md

Knowledge from https://the-asf.slack.com/archives/CJ8D1JTB8/p1595434977062400

* Update native-batch.md

* Fixed broken link + some grammar
2021-07-27 09:17:01 -07:00
Kashif Faraz 8a4e27f51d
Select broker based on query context parameter `brokerService` (#11495)
This change allows the selection of a specific broker service (or broker tier) by the Router.

The newly added ManualTieredBrokerSelectorStrategy works as follows:

Check for the parameter brokerService in the query context. If this is a valid broker service, use it.
Check if the field defaultManualBrokerService has been set in the strategy. If this is a valid broker service, use it.
Move on to the next strategy
2021-07-27 20:56:05 +05:30
Peter Marshall 60fdf7a734
Rollup measurement query amended (#11479)
By user request from https://groups.google.com/g/druid-user/c/bFkOtE-1eQg - gives the measure as a floating point instead of an integer.
2021-07-27 06:29:29 -07:00
Maytas Monsereenusorn c068906fca
Make intermediate store for shuffle tasks an extension point (#11492)
* add interface

* add docs

* fix errors

* fix injection

* fix injection

* update javadoc
2021-07-27 11:29:43 +07:00
Peter Marshall 973e5bf7d0
Docs - HLL lgK tip and slight layout change (#11482)
* HLL lgK and a tip

Knowledge transfer from https://the-asf.slack.com/archives/CJ8D1JTB8/p1600699967024200.  Attempted to make a connection between the SQL HLL function and the HLL underneath without getting too complicated.  Also added a note about using K over 16 being pretty much pointless.

* Corrected spelling

* Create datasketches-hll.md

Put roll-up back to rollup

* Update docs/development/extensions-core/datasketches-hll.md

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2021-07-26 12:28:53 -07:00
Lucas Capistrant 9767b42e85
Add a new metric query/segments/count that is not emitted by default (#11394)
* Add a new metric query/segments/count that is not emitted by default

* docs

* test the default implementation of the metric

* fix spelling error in docs

* document the fact that query retries will result in additional metric emissions

* update using recommended text from @jihoonson
2021-07-22 17:57:35 -07:00
benkrug 167c45260c
Update druid-vs-kudu.md (#11470)
small typo - "need" to "needed"
2021-07-21 22:58:14 +08:00
Maytas Monsereenusorn 6ce3b6ca2d
Improve documentation for druid.indexer.autoscale.workerCapacityHint config (#11444)
* fix doc

* address comments

* Update docs/configuration/index.md

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
2021-07-21 12:48:56 +07:00
Paul Rogers aa8c615ac2
Updates to source and doc build pages (#11464)
* Updates to source and doc build pages.

Clarifies a few points for newbies.

* Fixed spelling error

And added spellcheck info to website README file.
2021-07-20 18:07:34 -07:00
Abhishek Agarwal 94c1671eaf
Split SegmentLoader into SegmentLoader and SegmentCacheManager (#11466)
This PR splits current SegmentLoader into SegmentLoader and SegmentCacheManager.

SegmentLoader - this class is responsible for building the segment object but does not expose any methods for downloading, cache space management, etc. Default implementation delegates the download operations to SegmentCacheManager and only contains the logic for building segments once downloaded. . This class will be used in SegmentManager to construct Segment objects.

SegmentCacheManager - this class manages the segment cache on the local disk. It fetches the segment files to the local disk, can clean up the cache, and in the future, support reserve and release on cache space. [See https://github.com/Make SegmentLoader extensible and customizable #11398]. This class will be used in ingestion tasks such as compaction, re-indexing where segment files need to be downloaded locally.
2021-07-21 00:14:19 +05:30
jerryleooo c7fdf1d685
Fix typo in ingestion spec sample (#11433)
* Update index.md

Fix typo in the ingestion spec sample

* fixed more typos
2021-07-19 22:02:21 -07:00
sthetland a366753ba5
Consolidate multi-value dimension doc and highlight configurability (#11428)
* Clarify options for multi-value dims
* Add first example
2021-07-15 10:19:10 -07:00
Maytas Monsereenusorn 8d7d60d18e
Improve Auto scaler pendingTaskBased provisioning strategy to handle when there are no currently running worker node better (#11440)
* fix pendingTaskBased

* fix doc

* address comments

* address comments

* address comments

* address comments

* address comments

* address comments

* address comments
2021-07-15 06:52:25 +07:00
Maytas Monsereenusorn 05d5dd9289
compaction/status API retains status for datasources that no longer existed causing in-memory used to grow unbounded (#11426)
* compaction/status API retains status for datasources that no longer existed causing in-memory used to grow unbounded

* compaction/status API retains status for datasources that no longer existed causing in-memory used to grow unbounded

* compaction/status API retains status for datasources that no longer existed causing in-memory used to grow unbounded

* fix test

* fix test
2021-07-13 09:48:06 +07:00
Agustin Gonzalez 7e61042794
Bound memory utilization for dynamic partitioning (i.e. memory growth is constant) (#11294)
* Bound memory in native batch ingest create segments

* Move BatchAppenderatorDriverTest to indexing service... note that we had to put the sink back in sinks in mergeandpush since the persistent data needs to be dropped and the sink is required for that

* Remove sinks from memory and clean up intermediate persists dirs manually after sink has been merged

* Changed name from RealtimeAppenderator to StreamAppenderator

* Style

* Incorporating tests from StreamAppenderatorTest

* Keep totalRows and cleanup code

* Added missing dep

* Fix unit test

* Checkstyle

* allowIncrementalPersists should always be true for batch

* Added sinks metadata

* clear sinks metadata when closing appenderator

* Style + minor edits to log msgs

* Update sinks metadata & totalRows when dropping a sink (segment)

* Remove max

* Intelli-j check

* Keep a count of hydrants persisted by sink for sanity check before merge

* Move out sanity

* Add previous hydrant count to sink metadata

* Remove redundant field from SinkMetadata

* Remove unneeded functions

* Cleanup unused code

* Removed unused code

* Remove unused field

* Exclude it from jacoco because it is very hard to get branch coverage

* Remove segment announcement and some other minor cleanup

* Add fallback flag

* Minor code cleanup

* Checkstyle

* Code review changes

* Update batchMemoryMappedIndex name

* Code review comments

* Exclude class from coverage, will include again when packaging gets fixed

* Moved test classes to server module

* More BatchAppenderator cleanup

* Fix bug in wrong counting of totalHydrants plus minor cleanup in add

* Removed left over comments

* Have BatchAppenderator follow the Appenderator contract for push & getSegments

* Fix LGTM violations

* Review comments

* Add stats after push is done

* Code review comments (cleanup, remove rest of synchronization constructs in batch appenderator, reneame feature flag,
remove real time flag stuff from stream appenderator, etc.)

* Update javadocs

* Add thread safety notice to BatchAppenderator

* Further cleanup config

* More config cleanup
2021-07-09 00:10:29 -07:00
Joseph Glanville d5e8d4d680
Avro union support (#10505)
* Avro union support

* Document new union support

* Add support for AvroStreamInputFormat and fix checkstyle

* Extend multi-member union test schema and format

* Some additional docs and add Enums to spelling

* Rename explodeUnions -> extractUnions

* explode -> extract

* ByType

* Correct spelling error
2021-07-06 22:05:41 -07:00
Clint Wylie 17efa6f556
add single input string expression dimension vector selector and better expression planning (#11213)
* add single input string expression dimension vector selector and better expression planning

* better

* fixes

* oops

* rework how vector processor factories choose string processors, fix to be less aggressive about vectorizing

* oops

* javadocs, renaming

* more javadocs

* benchmarks

* use string expression vector processor with vector size 1 instead of expr.eval

* better logging

* javadocs, surprising number of the the

* more

* simplify
2021-07-06 11:20:49 -07:00
frank chen 906a704c55
Eliminate ambiguities of KB/MB/GB in the doc (#11333)
* GB ---> GiB

* suppress spelling check

* MB --> MiB, KB --> KiB

* Use IEC binary prefix

* Add reference link

* Fix doc style
2021-06-30 13:42:45 -07:00
Clint Wylie df9b57aa1a
bitwise aggregators, better null handling options for expression agg (#11280)
* bitwise aggregators, better nulls for expression agg

* correct behavior

* rework deserialize, better names

* fix json, share mask
2021-06-25 16:51:16 -07:00
sthetland fd0931d35e
Azure data lake input source (#11153)
* Mention Azure Data Lake

* Make consistent with other entries

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
2021-06-25 15:54:34 -07:00
Hoseung Lee ed0a57e106
Update kafka-ingestion.md to clarify PasswordProvider support limitation (#11374)
Co-authored-by: Clint Wylie <cjwylie@gmail.com>

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2021-06-24 21:54:48 -07:00
Yi Yuan de8daf8139
Delete buildV9Directly in Kafka and Kinesis Indexing Service (#11351)
* delete_buildV9Directly_in_kafka_and_kinesis_indexing_service

* delete

* delete them from server

* delete buildV9Directly from hadoop indexing

* bug fixed

Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-06-23 16:36:46 -07:00
Clint Wylie bfbd7ec432
fix a bugs related to SQL type inference return type nullability (#11327)
* fix a bunch of type inference nullability bugs

* fixes

* style

* fix test

* fix concat
2021-06-15 12:26:59 -07:00
Charles Smith a1ed3a407d
clarify bySegment is native only (#11331) 2021-06-11 13:48:17 -07:00
Yi Yuan 8de0d36c52
Allow query through router when load moving average extension (#11276)
* init commit

* change NoopQuerySegmentWalker name

* change doc

* move NoopQuerySegmentWalker and add document

* fix doc

Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-06-10 18:46:53 +08:00
Egor Riashin 9047fa3d9c
S3 ingestion can assume role (#10995)
* feature s3 assume role

* feature s3 assume role

* feature s3 assume role

* feature s3 assume role

* feature s3 assume role

* feature s3 assume role

* tests fix

* spelling fix

* sts fix

Co-authored-by: egor-ryashin <egor.ryashin@rilldata.com>
2021-06-09 16:02:35 +05:30
Yi Yuan 145cf9e5c3
fix document about input format (#11342)
Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-06-08 23:44:54 +08:00
frank chen 2ee7e31e5b
Fix syntax error (#11332) 2021-06-07 22:35:02 -07:00
frank chen d5139c9543
Fix permission problems in docker (#11299)
* Create /opt/data to fix permission problem

* eliminate symlink to avoid compatibility problem on AWS Fargate

* Add a workaround section

* Update instruction for named volume

* Use named volume in docker-compose

* Revert some doc change

* Resolve review comments
2021-06-01 17:33:27 -07:00
frank chen e664bfd433
Improve doc of movingAverage (#11262)
* Make doc more directive

Signed-off-by: frank chen <frank.chen021@outlook.com>

* Add limitation

Signed-off-by: frank chen <frank.chen021@outlook.com>

* Suppress spelling check error
2021-05-28 13:10:55 +08:00
frank chen 60843bd11f
Add configuration suggestion to `druid.indexer.storage.type` (#11304) 2021-05-27 06:44:47 -07:00
Xavier Léauté b517c3339b
remove ZooKeeper 3.4 support + pass tests with Java 15 (#11073)
With this change, Druid will only support ZooKeeper 3.5.x and later.

In order to support Java 15 we need to switch to ZK 3.5.x client libraries and drop support for ZK 3.4.x
(see #10780 for the detailed reasons) 

* remove ZooKeeper 3.4.x compatibility
* exclude additional ZK 3.5.x netty dependencies to ensure we use our version
* keep ZooKeeper version used for integration tests in sync with client library version
* remove the need to specify ZK version at runtime for docker
* add support to run integration tests with JDK 15
* build and run unit tests with Java 15 in travis
2021-05-25 12:49:49 -07:00
Agustin Gonzalez 4ba5738ffb
Add an issues section to deal with common issues when building druid (#11271) 2021-05-21 09:04:51 -07:00
Charles Smith 403dcf5cfb
fixes some typos, edits for style (#11258) 2021-05-21 08:58:39 -07:00
Charles Smith fcb4eaa3d4
add docs for high-churn datasource cleanup (#11245)
* add docs for high-churn datasource cleanup

* fix most comments except for task log

* address  comments

* update strategy recommendation

* address addtional comments

* fix

* address comments

* address comments from @sthetland
2021-05-20 09:48:42 -07:00
Clint Wylie 3649c608d2
array handling improvements (#11233)
* fix jdbc array handling, split handling for some array and multi value operator, split and add more tests

* formatting
2021-05-13 18:50:32 -07:00
Maytas Monsereenusorn 3455352241
Add feature to automatically remove compaction configurations for inactive datasources (#11232)
* add auto cleanup

* add auto cleanup

* add auto cleanup

* add tests

* add tests

* use retryutils

* use retryutils

* use retryutils

* address comments
2021-05-11 18:49:18 -07:00
Agustin Gonzalez 8e5048e643
Avoid memory mapping hydrants after they are persisted & after they are merged for native batch ingestion (#11123)
* Avoid mapping hydrants in create segments phase for native ingestion

* Drop queriable indices after a given sink is fully merged

* Do not drop memory mappings for realtime ingestion

* Style fixes

* Renamed to match use case better

* Rollback memoization code and use the real time flag instead

* Null ptr fix in FireHydrant toString plus adjustments to memory pressure tracking calculations

* Style

* Log some count stats

* Make sure sinks size is obtained at the right time

* BatchAppenderator unit test

* Fix comment typos

* Renamed methods to make them more readable

* Move persisted metadata from FireHydrant class to AppenderatorImpl. Removed superfluous differences and fix comment typo. Removed custom comparator

* Missing dependency

* Make persisted hydrant metadata map concurrent and better reflect the fact that keys are Java references. Maintain persisted metadata when dropping/closing segments.

* Replaced concurrent variables with normal ones

* Added   batchMemoryMappedIndex "fallback" flag with default "false". Set this to "true" make code fallback to previous code path.

* Style fix.

* Added note to new setting in doc, using Iterables.size (and removing a dependency), and fixing a typo in a comment.

* Forgot to commit this edited documentation message
2021-05-11 14:34:26 -07:00
Maytas Monsereenusorn 4326e699bd
Add feature to automatically remove datasource metadata based on retention period (#11227)
* add auto clean up datasource metadata

* add test

* fix checkstyle

* add comments

* fix error

* address comments

* Address comments

* fix test

* fix test

* fix typo

* add comment

* fix test

* fix test
2021-05-11 01:22:33 -07:00
Charles Smith fae7ebf489
change errant 'none' configuration to 'manual': (#11218) 2021-05-10 22:04:18 -07:00
Clint Wylie 691d7a1d54
SQL timeseries no longer skip empty buckets with all granularity (#11188)
* SQL timeseries no longer skip empty buckets with all granularity

* add comment, fix tests

* the ol switcheroo

* revert unintended change

* docs and more tests

* style

* make checkstyle happy

* docs fixes and more tests

* add docs, tests for array_agg

* fixes

* oops

* doc stuffs

* fix compile, match doc style
2021-05-10 10:13:37 -07:00
frank chen fa113fb4a9
Fix default value (#11220) 2021-05-10 10:11:26 -07:00
Yuanli Han 14f1f2aa76
Fix a broken link in the development doc (#11226) 2021-05-10 16:14:06 +08:00
Yuanli Han 8647040f4d
Allow user to set group.id for Kafka ingestion task (#11147)
* allow user to set group.id for Kafka ingestion task

* fix test coverage by removing deprecated code and add doc

* fix typo

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: frank chen <frankchen@apache.org>

Co-authored-by: frank chen <frankchen@apache.org>
2021-05-09 11:56:19 +08:00
Jihoon Son 2df42143ae
Fix idempotence of segment allocation and task report apis in native batch ingestion (#11189)
* Fix idempotence of segment allocation and task report apis in native
batch ingestion

* better error and javadoc

* checkstyle and dependency

* fix tests and add more tests

* task config instead of context; add doc

* unused import and dependency

* typo in doc

* fix unintended changes

* fix wrong import

* remove unnecessary error handling

* add task context back

* default task context

* fix test and doc

* address comments

* unused imports
2021-05-07 14:29:48 -07:00
Charles Smith cf2cde1d2d
add links to release notes, light refactor of landing page (#11051)
* add links to release notes, light refactor of landing page

* Update docs/design/index.md
2021-05-07 14:26:47 -07:00
benkrug 49c8307b72
Update datasource.md (#10864)
* Update datasource.md

Change "table" to "datasource" in join discussion: This means that all datasources
other than the leftmost "base" table must fit in memory.

According to docs on datasources, "datasource" is the more general term, and a table is a kind of datasource.  In the context here, then, "datasource" is applicable.

* left-hand table -> left-hand datasource

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

Co-authored-by: sthetland <steve.hetland@imply.io>
Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
2021-05-07 01:14:45 -07:00
Lasse Krogh Mammen 9be2a5cdc2
Add documentation re alphabetical sorted of MV dimensions (#10695) 2021-05-07 01:12:32 -07:00
Maytas Monsereenusorn d73f72e508
Add feature to automatically remove supervisor based on retention period (#11200)
* add auto clean up

* add test

* add test

* fix test

* Address comments

* Address comments
2021-05-06 22:25:23 -07:00
imply-jbalik 4adb121234
Fix example of prefixes for Cloud Input Sources(eg. S3) (#11192)
Fixed a syntax error in "prefix" lines in docs/ingestion/native-batch.md

S3 requires a trailing slash for directory like structures, so this updates the examples to include the trailing slashes.
2021-05-05 21:19:31 -07:00
Yuanli Han 34169c8550
fix doc (#11202)
(cherry picked from commit ffb3c049726b5e461c6f7f8b6f4b75d2cb907dcc)
2021-05-05 06:17:07 -07:00
Lucas Capistrant bb3c810b36
Create dynamic config that can limit number of non-primary replicants loaded per coordination cycle (#11135)
* lay the groundwork for throttling replicant loads per RunRules execution

* Add dynamic coordinator config to control new replicant threshold.

* remove redundant line

* add some unit tests

* fix checkstyle error

* add documentation for new dynamic config

* improve docs and logs

* Alter how null is handled for new config. If null, manually set as default
2021-05-05 07:39:36 -05:00
Clint Wylie 554f1ffeee
ARRAY_AGG sql aggregator function (#11157)
* ARRAY_AGG sql aggregator function

* add javadoc

* spelling

* review stuff, return null instead of empty when nil input

* review stuff

* Update sql.md

* use type inference for finalize, refactor some things
2021-05-03 22:17:10 -07:00
imply-jbalik 6f7701e742
fixed array syntax (#11191) 2021-05-03 21:38:16 -07:00
sthetland ca1412d574
Reduce visibility of Tranquility documentation (#11134)
* reduce visibility of tranquility doc

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
2021-05-03 16:48:24 -07:00
Maytas Monsereenusorn 84aac4832d
Add feature to automatically remove rules based on retention period (#11164)
* Add feature to automatically remove rules based on retention period

* Add feature to automatically remove rules based on retention period

* address comments
2021-05-03 11:50:45 -07:00
benkrug fdab95ea99
Update index.md (#11174)
tiny change for readability
2021-04-30 09:40:19 -07:00
Jeet Patel 7139c60868
Change the `id` for `kubernetes` doc link to work (#11176)
* Change the `id` for doc link to work

* Added `druid-kubernetes-extensions` to the list
2021-04-28 10:12:28 -07:00
Jeet Patel 31042cddf5
Fix `defaultMetricDimensions.json` path link (#11156) 2021-04-24 11:08:03 +08:00
Gian Merlino a47c0d2579
Clarify meaning of "root-level fields" in the documentation. (#11143) 2021-04-24 11:06:08 +08:00
Clint Wylie 57ff1f9cdb
expression aggregator (#11104)
* add experimental expression aggregator

* add test

* fix lgtm

* fix test

* adjust test

* use not null constant

* array_set_concat docs

* add equals and hashcode and tostring

* fix it

* spelling

* do multi-value magic for expression agg, more javadocs, tests

* formatting

* fix inspection

* more better

* nullable
2021-04-22 18:30:16 -07:00
Maytas Monsereenusorn 6d2b5cdd7e
Add feature to automatically remove audit logs based on retention period (#11084)
* add docs

* add impl

* fix checkstyle

* fix test

* add test

* fix checkstyle

* fix checkstyle

* fix test

* Address comments

* Address comments

* fix spelling

* fix docs
2021-04-20 17:10:43 -07:00
Charles Smith 09dcf6aa36
fix syntax error for loadstatus api (#11136) 2021-04-20 14:17:20 +08:00
Gian Merlino cb7c6ac314
Doc updates for union datasources. (#11103)
The main one is updating datasources.md to talk about SQL. (It still said
that table unions are not supported in SQL.) Also, this doc update adds
some clarifying details on limitations.
2021-04-14 18:18:14 -07:00
Charles Smith b51632b0bf
Update security overview with additional recommendations (#11016)
* updatee security overview with additional recommendations for improved security

* address first set of review questions

* Update docs/operations/security-overview.md

* Update docs/operations/security-overview.md

* apply changes from review

* Update docs/operations/security-overview.md

Co-authored-by: Suneet Saldanha <suneet@apache.org>

* Update docs/operations/security-overview.md

Co-authored-by: Suneet Saldanha <suneet@apache.org>

* Update docs/operations/security-overview.md

Co-authored-by: Suneet Saldanha <suneet@apache.org>

* Update security-overview.md

fix additional comments & typos cc: @suneet-s, @jihoonsoon

Co-authored-by: Suneet Saldanha <suneet@apache.org>
2021-04-14 08:58:17 -07:00
Maytas Monsereenusorn f968400170
Introduce a new configuration that skip storing audit payload if payload size exceed limit and skip storing null fields for audit payload (#11078)
* Add config to skip storing audit payload if exceed limit

* fix checkstyle

* change config name

* skip null fields for audit payload

* fix checkstyle

* address comments

* fix guice

* fix test

* add tests

* address comments

* address comments

* address comments

* fix checkstyle

* address comments

* fix test

* fix test

* address comments

* Address comments

Co-authored-by: Jihoon Son <jihoonson@apache.org>

Co-authored-by: Jihoon Son <jihoonson@apache.org>
2021-04-13 20:18:28 -07:00
bergmt2000 f60d8ea1c3
Update index.md (#11105)
Fix json typo in readme for granularitySpec in compaction config example
2021-04-13 16:26:36 +08:00
Yi Yuan 0e0c1a1aaf
add protobuf inputformat (#11018)
* add protobuf inputformat

* repair pom

* alter intermediateRow to type of Dynamicmessage

* add document

* refine test

* fix document

* add protoBytesDecoder

* refine document and add ser test

* add hash

* add schema registry ser test

Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-04-12 22:03:13 -07:00
Yi Yuan d0a94a8c14
add avro stream input format (#11040)
* add avro stream input format

* bug fixed

* add document

* doc fix

* change doc

* add integretion test

* bug fixed

* bug fixed

* add string as binary getter

Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-04-12 21:53:41 -07:00
zhangyue19921010 95b82dd325
Add missing API references for coordinator (#10967)
* add miss API references for coordinator

* add miss API references for coordinator

* add miss API references for coordinator

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-04-09 18:20:47 -07:00
Maytas Monsereenusorn 4576152e4a
Make dropExisting flag for Compaction configurable and add warning documentations (#11070)
* Make dropExisting flag for Compaction configurable

* fix checkstyle

* fix checkstyle

* fix test

* add tests

* fix spelling

* fix docs

* add IT

* fix test

* fix doc

* fix doc
2021-04-09 00:12:28 -07:00
Lucas Capistrant 8264203cee
Allow client to configure batch ingestion task to wait to complete until segments are confirmed to be available by other (#10676)
* Add ability to wait for segment availability for batch jobs

* IT updates

* fix queries in legacy hadoop IT

* Fix broken indexing integration tests

* address an lgtm flag

* spell checker still flagging for hadoop doc. adding under that file header too

* fix compaction IT

* Updates to wait for availability method

* improve unit testing for patch

* fix bad indentation

* refactor waitForSegmentAvailability

* Fixes based off of review comments

* cleanup to get compile after merging with master

* fix failing test after previous logic update

* add back code that must have gotten deleted during conflict resolution

* update some logging code

* fixes to get compilation working after merge with master

* reset interrupt flag in catch block after code review pointed it out

* small changes following self-review

* fixup some issues brought on by merge with master

* small changes after review

* cleanup a little bit after merge with master

* Fix potential resource leak in AbstractBatchIndexTask

* syntax fix

* Add a Compcation TuningConfig type

* add docs stipulating the lack of support by Compaction tasks for the new config

* Fixup compilation errors after merge with master

* Remove erreneous newline
2021-04-08 21:03:00 -07:00
sthetland dd4c5f2a17
Update using-caching.md (#11069) 2021-04-08 16:48:26 -05:00
sthetland fb6751fa45
Fix old broken link (#11048)
* link check fixes

* updated link target

* Update aggregations.md

* spelling error
2021-04-07 20:40:50 -07:00
Himanshu a0d52c3def
k8s discovery module: fix issue for druid.host being more than 63chars not permitted as k8s resource label value (#10961)
* k8s discovery module: fix issue for druid.host being more than 63chars not permitted as k8s resource label value

* update doc

* fix test
2021-04-07 17:45:28 -07:00
Cameron Teasdale 786207995e
add minimal documentation for expression filters (#11045)
* add minimal documentation for expression filters

* Update docs/querying/filters.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* Update docs/querying/filters.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* Update docs/querying/filters.md

Co-authored-by: Alejandro Lujan <andanthor@gmail.com>

* Update docs/querying/filters.md

Co-authored-by: Alejandro Lujan <andanthor@gmail.com>

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
Co-authored-by: sthetland <steve.hetland@imply.io>
Co-authored-by: Alejandro Lujan <andanthor@gmail.com>
2021-04-07 16:58:28 -07:00
Abhishek Agarwal 0df0bff44b
Enable multiple distinct aggregators in same query (#11014)
* Enable multiple distinct count

* Add more tests

* fix sql test

* docs fix

* Address nits
2021-04-07 00:52:19 -07:00
Jihoon Son cc12a57034
Enforce allow list for JDBC properties by default (#11063)
* Enforce allow list for JDBC properties by default

* fix tests
2021-04-06 19:46:19 -07:00
zachjsh 8cf1e83543
Add paramter to loadstatus API to compute underdeplication against cluster view (#11056)
* Add paramter to loadstatus API to compute underdeplication against cluster view

This change adds a query parameter `computeUsingClusterView` to loadstatus apis
that if specified have the coordinator compute undereplication for segments based
on the number of services available within cluster that the segment can be replicated
on, instead of the configured replication count configured in load rule. A default
load rule is created in all clusters that specified that all segments should be
replicated 2 times. As replicas are forced to be on separate nodes in the cluster,
this causes the loadstatus api to report that there are under-replicated segments
when there is only 1 data server in the cluster. In this case, calling loadstatus
api without this new query parameter will always result in a response indicating
under-replication of segments

* * fix exception mapper

* * Address review comments

* * update external API docs

* Apply suggestions from code review

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

* * update more external docs

* * update javadoc

* Apply suggestions from code review

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
2021-04-05 00:02:43 -04:00
Clint Wylie 470d659ca0
add documentation for coordinator dynamic configuration (#11052) 2021-04-02 22:01:43 -07:00
Jihoon Son cfcebc40f6
Allow list for JDBC connection properties to address CVE-2021-26919 (#11047)
* Allow list for JDBC connection properties to address CVE-2021-26919

* fix tests for java 11
2021-04-01 17:30:47 -07:00
Maytas Monsereenusorn d7f5293364
Add an option for ingestion task to drop (mark unused) all existing segments that are contained by interval in the ingestionSpec (#11025)
* Auto-Compaction can run indefinitely when segmentGranularity is changed from coarser to finer.

* Add option to drop segments after ingestion

* fix checkstyle

* add tests

* add tests

* add tests

* fix test

* add tests

* fix checkstyle

* fix checkstyle

* add docs

* fix docs

* address comments

* address comments

* fix spelling
2021-04-01 12:29:36 -07:00
Charles Smith 67dd61e6e4
remove outdated info from faq (#11053)
* remove outdated info from faq
2021-04-01 08:13:29 -07:00
Parag Jain b35486fa81
request logs through kafka emitter (#11036)
* request logs through kafka emitter

* travis fixes

* review comments

* kafka emitter unit test

* new line

* travis checks

* checkstyle fix

* count request lost when request topic is null
2021-04-01 11:31:32 +05:30
Lasse Krogh Mammen 782a1d4e6c
Add Calcite Avatica protobuf handler (#10543) 2021-03-31 12:46:25 -07:00
Tushar Raj 6789ed0a05
Update reset-cluster.md (#10990)
fixed Error: Could not find or load main class org.apache.druid.cli.Main
2021-03-29 20:38:35 -07:00
Charles Smith 8544d29bc7
remove experimental from Kinesis with caveats (#10998)
* remove experimental from Kinesis with caveats

* add suggested known issue

* spelling fixes
2021-03-29 13:57:58 -07:00
Parag Jain 2fdc313e4d
GCS lookup support (#11026)
* GCS lookup support

* checkstyle fix

* review comments

* review comments

* remove unused import
2021-03-30 01:40:41 +05:30
Gian Merlino bf20f9e979
DruidInputSource: Fix issues in column projection, timestamp handling. (#10267)
* DruidInputSource: Fix issues in column projection, timestamp handling.

DruidInputSource, DruidSegmentReader changes:

1) Remove "dimensions" and "metrics". They are not necessary, because we
   can compute which columns we need to read based on what is going to
   be used by the timestamp, transform, dimensions, and metrics.
2) Start using ColumnsFilter (see below) to decide which columns we need
   to read.
3) Actually respect the "timestampSpec". Previously, it was ignored, and
   the timestamp of the returned InputRows was set to the `__time` column
   of the input datasource.

(1) and (2) together fix a bug in which the DruidInputSource would not
properly read columns that are used as inputs to a transformSpec.

(3) fixes a bug where the timestampSpec would be ignored if you attempted
to set the column to something other than `__time`.

(1) and (3) are breaking changes.

Web console changes:

1) Remove "Dimensions" and "Metrics" from the Druid input source.
2) Set timestampSpec to `{"column": "__time", "format": "millis"}` for
   compatibility with the new behavior.

Other changes:

1) Add ColumnsFilter, a new class that allows input readers to determine
   which columns they need to read. Currently, it's only used by the
   DruidInputSource, but it could be used by other columnar input sources
   in the future.
2) Add a ColumnsFilter to InputRowSchema.
3) Remove the metric names from InputRowSchema (they were unused).
4) Add InputRowSchemas.fromDataSchema method that computes the proper
   ColumnsFilter for given timestamp, dimensions, transform, and metrics.
5) Add "getRequiredColumns" method to TransformSpec to support the above.

* Various fixups.

* Uncomment incorrectly commented lines.

* Move TransformSpecTest to the proper module.

* Add druid.indexer.task.ignoreTimestampSpecForDruidInputSource setting.

* Fix.

* Fix build.

* Checkstyle.

* Misc fixes.

* Fix test.

* Move config.

* Fix imports.

* Fixup.

* Fix ShuffleResourceTest.

* Add import.

* Smarter exclusions.

* Fixes based on tests.

Also, add TIME_COLUMN constant in the web console.

* Adjustments for tests.

* Reorder test data.

* Update docs.

* Update docs to say Druid 0.22.0 instead of 0.21.0.

* Fix test.

* Fix ITAutoCompactionTest.

* Changes from review & from merging.
2021-03-25 10:32:21 -07:00
Charles Smith d69533dbd9
First refactor of compaction (#10935)
* first pass compaction refactor. includes updated behavior for queryGranularity. removes duplicated doc

* fix links, typos, some reorganization

* fix spelling. TBD still there for work in progress

* updates tutorial examples, adds more clarification around compaction use cases

* add granularity spec to automatic compaction config

* final edits

* spelling fixes

* apply suggestions from review

* upadtes from review

* last edits

* move note

* clarify null

* fix links & spelling

* latest review

* edits to auto-compaction config

* add back rollup

* fix links & spelling

* Update compaction.md

add granularityspec to example
2021-03-24 11:41:44 -07:00
Atul Mohan 3d7e7c2c83
Avoid deletion of load/drop entry from CuratorLoadQueuePeon in case of load timeout (#10213)
* Skip queue removal on timeout

* Clarify error

* Add new config to control replication

Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2021-03-17 11:34:05 -07:00
Mohammadamin Karbasforushan dfad38d561
Fix unclear documentation of human readable byte (#10825)
* Fix unclear documentation of human readable byte

Follows https://github.com/apache/druid/pull/10203 ;
See https://github.com/apache/druid/pull/10203#issuecomment-771080634 .

* Fix sentence style

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
2021-03-11 00:01:38 -08:00
benkrug 7f96ca8f5e
Update topnquery.md (#10944)
minor edits of the English, no meanings changed (imo)
2021-03-09 15:19:02 -08:00
Yi Yuan 36e86a2880
Add protobuf schema registry (#10839)
* dd_protobuf_schema_registry

* change licese

* delete some annotation

* nodify tests

* delete extra exception

* add licenses

* add descriptor and protoMessageType in ProtobufInputRowParser for adopt to old version

* seperate kafka-protobuf-provider

* modify protobuf.md

* refine protobuf.md

* add config and header

* bug fixed

Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-03-09 15:15:51 -08:00
Tianxin Zhao a57c28e9ce
prometheus metric exporter (#10412)
* prometheus-emitter

* use existing jetty server to expose prometheus collection endpoint

* unused variables

* better variable names

* removed unused dependencies

* more metric definitions

* reorganize

* use prometheus HTTPServer instead of hooking into Jetty server

* temporary empty help string

* temporary non-empty help.  fix incorrect dimension value in JSON (also updated statsd json)

* added full help text.  added metric conversion factor for timers that are not using seconds. Correct metric dimension name in documentation

* added documentation for prometheus emitter

* safety for invalid labelNames

* fix travis checks

* Unit test and better sanitization of metrics names and label values

* add precondition to check namespace against regex

* use precompiled regex

* remove static imports. fix metric types

* better docs. fix possible NPE in PrometheusEmitterConfig. Guard against multiple calls to PrometheusEmitter.start()

* Update regex for label-value replacements to allow internal numeric values.  Additional tests

* Adds missing license header
updates website/.spelling to add words used in prometheus-emitter docs.
updates docs/operations/metrics.md to correct the spelling of
bufferPoolName

* fixes version in extensions-contrib/prometheus-emitter

* fix style guide errors

* update import ordering

* add another word to website/.spelling

* remove unthrown declared exception

* remove unused import

* Pushgateway strategy for metrics

* typo

* Format fix and nullable strategy

* Update pom file for prometheus-emitter

* code review comments. Counter to gauge for cache metrics, periodical task to pushGateway

* Syntax fix

* Dimension label regex include numeric character back, fix previous commit

* bump prometheus-emitter pom dev version

* Remove scheduled task inside poen that push metrics

* Fix checkstyle

* Unit test coverage

* Unit test coverage

* Spelling

* Doc fix

* spelling

Co-authored-by: Michael Schiff <michael.schiff@tubemogul.com>
Co-authored-by: Michael Schiff <schiff.michael@gmail.com>
Co-authored-by: Tianxin Zhao <tianxin.zhao@tubemogul.com>
Co-authored-by: Tianxin Zhao <tizhao@adobe.com>
2021-03-09 14:37:31 -08:00
Abhishek Agarwal c66951a59e
Add flag in SQL to disable left base filter optimization for joins (#10947)
* Add flag to disable left base filter

* code coverage

* Draft

* Review comments

* code coverage

* add docs

* Add old tests
2021-03-09 13:07:34 -08:00
Charles Smith 0f81ce32a0
refactor query caching docs (#10848)
* refactor query caching

* Update docs/querying/using-caching.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* Update docs/querying/using-caching.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* Update docs/querying/using-caching.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* Update docs/querying/using-caching.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* Update docs/querying/using-caching.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* Update docs/querying/using-caching.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* Update docs/querying/using-caching.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* Update docs/querying/using-caching.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* add description for context link

* accept suggestions

* reword, rework some awkward language

* incorporate feedback, fix errors

* add back perf considerations

* Apply suggestions from code review

applying @suneet-s 's changes

Co-authored-by: Suneet Saldanha <suneet@apache.org>

* Update caching.md

fix link

Co-authored-by: sthetland <steve.hetland@imply.io>
Co-authored-by: Suneet Saldanha <suneet@apache.org>
2021-03-08 22:25:48 -08:00
Jihoon Son 9946306d4b
Add configurations for allowed protocols for HTTP and HDFS inputSources/firehoses (#10830)
* Allow only HTTP and HTTPS protocols for the HTTP inputSource

* rename

* Update core/src/main/java/org/apache/druid/data/input/impl/HttpInputSource.java

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

* fix http firehose and update doc

* HDFS inputSource

* add configs for allowed protocols

* fix checkstyle and doc

* more checkstyle

* remove stale doc

* remove more doc

* Apply doc suggestions from code review

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

* update hdfs address in docs

* fix test

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
2021-03-06 11:43:00 -08:00
zhangyue19921010 bddacbb1c3
Dynamic auto scale Kafka-Stream ingest tasks (#10524)
* druid task auto scale based on kafka lag

* fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig

* druid task auto scale based on kafka lag

* fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig

* test dynamic auto scale done

* auto scale tasks tested on prd cluster

* auto scale tasks tested on prd cluster

* modify code style to solve 29055.10 29055.9 29055.17 29055.18 29055.19 29055.20

* rename test fiel function

* change codes and add docs based on capistrant reviewed

* midify test docs

* modify docs

* modify docs

* modify docs

* merge from master

* Extract the autoScale logic out of SeekableStreamSupervisor to minimize putting more stuff inside there &&  Make autoscaling algorithm configurable and scalable.

* fix ci failed

* revert msic.xml

* add uts to test autoscaler create && scale out/in and kafka ingest with scale enable

* add more uts

* fix inner class check

* add IT for kafka ingestion with autoscaler

* add new IT in groups=kafka-index named testKafkaIndexDataWithWithAutoscaler

* review change

* code review

* remove unused imports

* fix NLP

* fix docs and UTs

* revert misc.xml

* use jackson to build autoScaleConfig with default values

* add uts

* use jackson to init AutoScalerConfig in IOConfig instead of Map<>

* autoscalerConfig interface and provide a defaultAutoScalerConfig

* modify uts

* modify docs

* fix checkstyle

* revert misc.xml

* modify uts

* reviewed code change

* reviewed code change

* code reviewed

* code review

* log changed

* do StringUtils.encodeForFormat when create allocationExec

* code review && limit taskCountMax to partitionNumbers

* modify docs

* code review

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-03-06 14:36:52 +05:30
Jihoon Son 16acd6686a
Remove stale 'namespace' config for JDBC lookups from doc (#10886)
* Remove stale 'namespace' config for JDBC lookups from doc and web-console

* revert webconsole change

* address comments
2021-03-04 17:16:34 -08:00
Atul Mohan be2ac8d6ce
Document type inference issues with dynamic params in SQL (#10801)
* Clarify docs

* Apply suggestions from code review

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
2021-03-04 03:48:11 -08:00
spinatelli 99198c02af
Add config and header support for confluent schema registry. (#10314)
* Add config and header support for confluent schema registry. (porting code from https://github.com/apache/druid/pull/9096)

* Add Eclipse Public License 2.0 to license check

* Update licenses.yaml, revert changes to check-licenses.py and dependencies for integration-tests

* Add spelling exception and remove unused dependency

* Use non-deprecated getSchemaById() and remove duplicated license entry

* Update docs/ingestion/data-formats.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* Added check for schema being null, as per Confluent code

* Missing imports and whitespace

* Updated unit tests with AvroSchema

Co-authored-by: Sergio Spinatelli <sergio.spinatelli.extern@7-tv.de>
Co-authored-by: Sergio Spinatelli <sergio.spinatelli.extern@joyn.de>
Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2021-02-27 14:25:35 -08:00
Charles Smith 573de3bc0d
clarify security requirements around HTTPInputSource (#10914)
* clarify security requirements around HTTPInputSource

* explicitly mention write/datasource in best practices. clarify that the ingestion task is the risk

* Update docs/operations/security-overview.md

Co-authored-by: Suneet Saldanha <suneet@apache.org>

Co-authored-by: Suneet Saldanha <suneet@apache.org>
2021-02-26 09:37:47 -08:00
zachjsh 67eff4110d
Improve Druid ldap auth documentation (#10915)
* Improve Druid ldap auth documentation

Improved the ldap auth docs by clarifying that the object classes and
attributes noted are specific to Microsoft Active Directory, and could
be different depending on the specific ldap server being used. Also
emphasized the importance of the memberOf field and noted that the
step about adding users to roles is only needed in certain circumstances.

* * add another note

* Apply suggestions from code review

Co-authored-by: sthetland <steve.hetland@imply.io>

* * simplify

* * Address review comments

Co-authored-by: sthetland <steve.hetland@imply.io>
2021-02-24 15:28:41 -08:00
Clint Wylie f34c6eb3c0
add druid jdbc handler config for minimum number of rows per frame (#10880)
* add druid jdbc handler config for minimum number of rows per frame

* javadocs and docs adjustments

* spelling

* adjust docs per review with minor tweaks

* adjust more
2021-02-23 02:11:04 -08:00
Clint Wylie cbbef80c7f
add SQL operators for bitwise expressions (#10823)
* add SQL operators for bitwise expressions

* more test

* fix spelling

* more tests
2021-02-18 20:56:33 -08:00
sthetland 1e40f51e65
Fix example names of security artifacts in docs (#10882)
* replacing example names

* unrelated typos

* unintended changes

* a few more typo fixes
2021-02-16 14:58:50 -08:00
Jihoon Son 1ec3f0bd73
Revert "Add support for Blacklisting some domains for HTTPInputSource (#10535)" (#10871)
This reverts commit 6b14bdb3a5.
2021-02-09 17:51:26 -08:00
Charles Smith 41a0c1d4b5
reword appendToExisting for clarity. add failure behavior (#10748) 2021-02-08 13:00:01 -08:00
Jihoon Son ac41e41232
Update doc for query errors and add unit tests for JsonParserIterator (#10833)
* Update doc for query errors and add unit tests for JsonParserIterator

* static constructor for convenience

* rename method
2021-02-05 02:55:32 -08:00
Abhishek Agarwal 96d26e5338
Fix kinesis ingestion bugs (#10761)
* add offsetFetchPeriod to kinesis ingestion doc

* Remove jackson dependencies from extensions

* Use fixed delay for lag collection

* Metrics reset after finishing processing

* comments

* Broaden the list of exceptions to retry for

* Unit tests

* Add more tests

* Refactoring

* re-order metrics

* Doc suggestions

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

* Add tests

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
2021-02-05 02:49:58 -08:00
Clint Wylie 2ce7b3dcf4
bitwise math function expressions (#10605)
* expressions: adding bitwise expressions

* double handling and vectorization

* move conversion to Evals

* revert unintended changes

* less magic, split convert functions, fix parser for funny exponent doubles

* fix spelling exceptions list

* more spelling

* fix grammar, add more test, fix docs

* fix docs

Co-authored-by: Max Kaplan <max@maxkaplan.me>
2021-01-28 11:16:53 -08:00
Maytas Monsereenusorn a46d561bd7
Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead (#10740)
* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* fix checkstyle

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* fix test

* fix test

* add log

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* address comments

* fix checkstyle

* fix checkstyle

* add config to skip overhead memory calculation

* add test for the skipBytesInMemoryOverheadCheck config

* add docs

* fix checkstyle

* fix checkstyle

* fix spelling

* address comments

* fix travis

* address comments
2021-01-27 00:34:56 -08:00
Himadri Singh 1c1b396eaa
AWS Web Identity / IRSA Support (#10541)
* AWS Web Identity Support

required for AWS IRSA

* Update kinesis-ingestion.md

* disabling coverage tests

https://github.com/apache/druid/pull/10541#issuecomment-737558213

* exclude coverage

* Update licenses.yaml
2021-01-25 18:44:02 +05:30
Charles Smith 99494e3d16
suggest index parallel for native batch reindexing > 1GB (#10788) 2021-01-22 21:54:28 -08:00
zhangyue19921010 bf1d1d583b
modify (#10778)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-01-22 09:20:13 -08:00
zhangyue19921010 2837a9b62f
[Minor Doc Fix] Correct the default value of `druid.server.http.gracefulShutdownTimeout` (#10661)
* done

* done

* done

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-01-08 15:23:08 -08:00
Abhishek Agarwal f66fdbfa5d
add offsetFetchPeriod to kinesis ingestion doc (#10734) 2021-01-08 14:19:26 -08:00
Himanshu c7b1212a43
AWS RDS token based password provider (#9518)
* refresh db pwd

* aws iam token password provider

* fix analyze-dependencies build

* fix doc build

* add  ut for BasicDataSourceExt

* more doc updates

* more  doc update

* moving aws  token password  provider to new extension

* remove duplicate changes

* make  all config inline

* extension docs

* refresh db  password  in SQL Firehose code path as well

* add ut

* fix build

* add new extension to distribution

* rds lib is not provided

* fix license build

* add version to license

* change parent version to 0.19.0-snapshot

* address review comments

* fix core/ code coverage

* Update server/src/main/java/org/apache/druid/metadata/BasicDataSourceExt.java

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* address review comments

* fix spellchecker

* remove inadvertant website file change

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2021-01-06 21:15:29 -08:00
Makdon f9fc1892d1
Typo: missing comma in json (#10711) 2021-01-06 13:49:50 -08:00
Jonathan Wei 68bb038b31
Multiphase segment merge for IndexMergerV9 (#10689)
* Multiphase merge for IndexMergerV9

* JSON fix

* Cleanup temp files

* Docs

* Address logging and add IT

* Fix spelling and test unloader datasource name
2021-01-05 22:19:09 -08:00
Himanshu d2e6240cac
k8s-int-test-build: zk-less druid cluster and http based segment/task managment (#10686)
* zk-less druid cluster in k8s build

* attempt to fix build and use http based remote task management

* mm/router logs for debugging

* add default account k8s role and binding for pod, configMap access

* fix issue

* change router port to 8088 for common readinessProbe

* break build_run_k8s_cluster.sh into separate scripts

* revert changes to K8sDruidNodeAnnouncer.java

* k8s extension doc update

* add license to new file

* address review comments

* do not try to load lookups at startup to improve cluster startup time
2021-01-05 18:51:47 -08:00
Charles Smith 797371598d
update syntax for golbal cached uri lookups (#10629) 2020-12-24 09:49:01 -08:00
Xavier Léauté b7a16d08a6
Update Apache Kafka to 2.7.0 (#10701)
- align scala versions to match Kafka
2020-12-22 13:56:00 -08:00
Lucas Capistrant 58ce2e55d8
Add dynamic coordinator config that allows control over how many segments are considered when picking a segment to move. (#10284)
* dynamic coord config adding more balancing control

add new dynamic coordinator config, maxSegmentsToConsiderPerMove. This
config caps the number of segments that are iterated over when selecting
a segment to move. The default value combined with current balancing
strategies will still iterate over all provided segments. However,
setting this value to something > 0 will cap the number of segments
visited. This could make sense in cases where a cluster has a very large
number of segments and the admins prefer less iterations vs a thorough
consideration of all segments provided.

* fix checkstyle failure

* Make doc more detailed for admin to understand when/why to use new config

* refactor PR to use a % of segments instead of raw number

* update the docs

* remove bad doc line

* fix typo in name of new dynamic config

* update RservoirSegmentSampler to gracefully deal with values > 100%

* add handler for <= 0 in ReservoirSegmentSampler

* fixup CoordinatorDynamicConfigTest naming and argument ordering

* fix items in docs after spellcheck flags

* Fix lgtm flag on missing space in string literal

* improve documentation for new config

* Add default value to config docs and add advice in cluster tuning doc

* Add percentOfSegmentsToConsiderPerMove to web console coord config dialog

* update jest snapshot after console change

* fix spell checker errors

* Improve debug logging in getRandomSegmentBalancerHolder to cover all bad inputs for % of segments to consider

* add new config back to web console module after merge with master

* fix ReservoirSegmentSamplerTest

* fix line breaks in coordinator console dialog

* Add a test that helps ensure not regressions for percentOfSegmentsToConsiderPerMove

* Make improvements based off of feedback in review

* additional cleanup coming from review

* Add a warning log if limit on segments to consider for move can't be calcluated

* remove unused import

* fix tests for CoordinatorDynamicConfig

* remove precondition test that is redundant in CoordinatorDynamicConfig Builder class
2020-12-22 08:27:55 -08:00
Clint Wylie da0eabaa01
integration test for coordinator and overlord leadership client (#10680)
* integration test for coordinator and overlord leadership, added sys.servers is_leader column

* docs

* remove not needed

* fix comments

* fix compile heh

* oof

* revert unintended

* fix tests, split out docker-compose file selection from starting cluster, use docker-compose down to stop cluster

* fixes

* style

* dang

* heh

* scripts are hard

* fix spelling

* fix thing that must not matter since was already wrong ip, log when test fails

* needs more heap

* fix merge

* less aggro
2020-12-17 22:50:12 -08:00
sthetland 6ae8059c09
cleaning up and fixing links (#10528)
* cleaning up and fixing links

* reverting local link

* Update indexer.md

* link checking

* Fixing one more stale link for PostgreSQL
2020-12-17 13:37:43 -08:00
Himanshu ac1882bf74
kubernetes based discovery druid extension to run Druid on K8S without Zookeeper (#10544)
* honor zk enablement config in more places in druid code

* kubernetes based discovery module

* fix spotbugs check

* fix intellij checks error

* fix doc link to kubernetes.md from extension

* make spellchecker happy

* update license.yaml

* fix dependency check errors

* update extension coverage

* UTs for BaseNodeRoleWatcher

* fix forbidden-api check

* update k8s module coverage ignores

* add Bouncy Castle License being same as MIT License for license checking purposes

* further update licenses.yaml

* label/annotation pre-existence assumption

* address review comment
2020-12-14 21:10:31 -08:00
Himanshu be019760bb
document DynamicConfigProvider for kafka consumer properties (#10658)
* document DynamicConfigProvider for kafka consumer properties

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Jihoon Son <jihoonson@apache.org>

* Update docs/development/extensions-core/kafka-ingestion.md

* fix doc build

Co-authored-by: Jihoon Son <jihoonson@apache.org>
2020-12-10 08:24:33 -08:00
Atul Mohan 44df05b8b2
Clarify split hint spec behavior (#10656) 2020-12-09 08:24:32 -06:00
Abhishek Agarwal 4ea1ab8531
Fix links in the grouping function doc (#10654) 2020-12-09 14:56:32 +08:00
Gian Merlino 96a387d972
Fixes and tests related to the Indexer process. (#10631)
* Fixes and tests related to the Indexer process.

Three bugs fixed:

1) Indexers would not announce themselves as segment servers if they
   did not have storage locations defined. This used to work, but was
   broken in #9971. Fixed this by adding an "isSegmentServer" method
   to ServerType and updating SegmentLoadDropHandler to always announce
   if this method returns true.

2) Certain batch task types were written in a way that assumed "isReady"
   would be called before "run", which is not guaranteed. In particular,
   they relied on it in order to initialize "taskLockHelper". Fixed this
   by updating AbstractBatchIndexTask to ensure "isReady" is called
   before "run" for these tasks.

3) UnifiedIndexerAppenderatorsManager did not properly handle complex
   datasources. Introduced DataSourceAnalysis in order to fix this.

Test changes:

1) Add a new "docker-compose.cli-indexer.yml" config that spins up an
   Indexer instead of a MiddleManager.

2) Introduce a "USE_INDEXER" environment variable that determines if
   docker-compose will start up an Indexer or a MiddleManager.

3) Duplicate all the jdk8 tests and run them in both MiddleManager and
   Indexer mode.

4) Various adjustments to encourage fail-fast errors in the Docker
   build scripts.

5) Various adjustments to speed up integration tests and reduce memory
   usage.

6) Add another Mac-specific approach to determining a machine's own IP.
   This was useful on my development machine.

7) Update segment-count check in ITCompactionTaskTest to eliminate a
   race condition (it was looking for 6 segments, which only exist
   together briefly, until the older 4 are marked unused).

Javadoc updates:

1) AbstractBatchIndexTask: Added javadocs to determineLockGranularityXXX
   that make it clear when taskLockHelper will be initialized as a side
   effect. (Related to the second bug above.)

2) Task: Clarified that "isReady" is not guaranteed to be called before
   "run". It was already implied, but now it's explicit.

3) ZkCoordinator: Clarified deprecation message.

4) DataSegmentServerAnnouncer: Clarified deprecation message.

* Fix stop_cluster script.

* Fix sanity check in script.

* Fix hashbang lines.

* Test and doc adjustments.

* Additional tests, and adjustments for tests.

* Split ITs back out.

* Revert change to druid_coordinator_period_indexingPeriod.

* Set Indexer capacity to match MM.

* Bump up Historical memory.

* Bump down coordinator, overlord memory.

* Bump up Broker memory.
2020-12-08 16:02:26 -08:00
frank chen c410648630
fix injection failure of StorageLocationSelectorStrategy objects (#10363)
* fix to allow customer storage location selector strategy

* add test cases to check instance of selector strategy

* update doc

* code format

* resolve code review comments

* inject StorageLocation

* fix CI

* fix mismatched license item reported by CI

* change property path from druid.segmentCache.locationSelectorStrategy.type to druid.segmentCache.locationSelector.strategy

* using a helper method to bind to correct property path
2020-12-08 09:48:31 -08:00
Abhishek Agarwal 26d74b3580
Add grouping_id function (#10518)
* First draft of grouping_id function

* Add more tests and documentation

* Add calcite tests

* Fix travis failures

* bit of a change

* Add documentation

* Fix typos

* typo fix
2020-12-07 11:46:29 -08:00
zhangyue19921010 229b5f359f
Remove hard limitation that druid(after 0.15.0) only can consume Kafka version 0.11.x or better (#10551)
* remove build in kafka consumer config :

* modify druid docs of kafka indexing service

* yuezhang

* modify doc

* modify docs

* fix kafkaindexTaskTest.java

* revert uncessary change

* add more logs and modify docs

* revert jdk version

* modify docs

* modify-kafka-version v2

* modify docs

* modify docs

* modify docs

* modify docs

* modify docs

* done

* remove useless import

* change code and add UT

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2020-12-03 17:37:59 -08:00
zhangyue19921010 e7e07eab11
[Improve Doc] : Modify the disadvantages of the lazyLoadOnStart feature. (#10608)
* modify docs

* modify docs

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2020-12-01 18:33:22 -08:00
frank chen 24f1e35b5d
fix desc of 'required' for granularity property (#10616) 2020-12-01 18:29:51 -08:00
Lucas Capistrant 2560bf0a19
Add new coordinator metrics for coordinator duty runtimes (#10603)
* Add new coordinator metrics for duty runtimes

* fix spelling for a constant variable value

* add comment clarifying why the global runtime metric is emitted where it is

* Remove duty alias in lieu of using the class name for metrics

* fix docs

* CoordinatorStats tests + add duty stats to accumulate() logic
2020-11-29 14:47:35 -08:00
frank chen fe693a4f01
Improve doc and exception message for invalid user configurations (#10598)
* improve doc and exception message

* add spelling check rules and remove unused import

* add a test to improve test coverage
2020-11-23 15:03:13 -08:00
Atul Mohan 111b431c07
Introduce query/timeout/count metric (#10567)
* Add timeout metric

* Add tests
2020-11-20 15:17:26 -08:00
sthetland ba915b7f56
Security overview documentation (#10339)
* initial file

* initial file

* security overview added

* ldap added

* spacing adjustments

* nits

* security graphics and doc review

* Update docs/operations/security-overview.md

Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>

* Update docs/operations/security-user-auth.md

Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>

* Update docs/operations/security-overview.md

Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>

* Update docs/operations/security-overview.md

Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>

* updates frm review

* review comments

* finish up review and light edits

* broken links

* spell check

Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>
2020-11-19 15:24:58 -08:00
michaelschiff 2f4d6da33f
Updates segment metadata query documentation (#10589)
* updates segment metadata query documentation to be clearer about cardinality estimation

* typo in documentation
2020-11-20 00:08:27 +05:30
zhangyue19921010 1272fb17e5
modify druid.historical.cache.maxEntrySize property in Unified format (#10590)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2020-11-17 16:36:50 -06:00
Atul Mohan 21e3c4b39c
Add missing docs for timeout exceptions (#10554)
* Add missing docs for timeout exceptions

* Add info on auth failures
2020-11-13 08:45:40 -06:00
Gian Merlino 3436297354
Clarify how ORDER BY works with UNION ALL (#10561)
Hopefully a bit clearer.
2020-11-05 20:12:03 -08:00
Mainak Ghosh d8e5a159e8
Update index.md (#10549)
Removing the extra `_` in the default for middlemanager category
2020-11-03 13:44:47 +05:30
Husky Zeng 9286153145
doc wrong description of configuration (#10546) 2020-11-02 17:57:16 -08:00
Nishant Bangarwa 6b14bdb3a5
Add support for Blacklisting some domains for HTTPInputSource (#10535)
fix inspections

refactor class name

change name

 add allowList as well

distinguish between empty and null list

Fix CI
2020-11-02 21:47:25 +05:30
Pierre Carrier 835b328851
docs/: use tuningConfig (#10540) 2020-10-30 09:39:21 -05:00
Charles Smith 9c51047cc8
Document correlation between credential iterations and query latency (#10532)
use link / heading instead of footnote
2020-10-29 12:47:24 -07:00
Atul Mohan 65a42f9eb1
Update overlord api docs (#10539) 2020-10-29 11:19:12 -05:00
Clint Wylie aa9c0ec650
update quickstart docker-compose example, add to release instructions (#10527)
* update quickstart docker-compose example, add to release instructions

* adjust

* spelling
2020-10-26 23:14:07 -07:00
awelsh93 a966de5319
Add https to druid-influxdb-emitter extension (#9938)
* Add https to druid-influxdb-emitter extension

* address CI failures

* increase test coverage

* tests for being unable to load trustStore

* fix EqualsVerifier test

* fix intellij inspection error

* use try-with-resources when loading trustStore
2020-10-26 19:49:26 -07:00
Abhishek Agarwal 04546b65ec
Additional documentation for query caching (#10503)
* Add documentation for when caching is unsupported

* Minor changes

* Minor doc fix

* Review comments

* Add more details

* Fix spelling check

* Fix doc for union query

* Trailing dot
2020-10-20 13:49:13 -07:00
Maytas Monsereenusorn 3538abd5d0
Make sure all fields in sys.segments are JSON-serialized (#10481)
* fix JSON format

* Change all columns in sys segments to be JSON

* Change all columns in sys segments to be JSON

* add tests

* fix failing tests

* fix failing tests
2020-10-14 13:49:46 -07:00
Maytas Monsereenusorn 9056d113d0
Add docs and integration tests for Auto-compaction snapshot status API (#10510)
* add docs and IT for Auto-compaction snapshot status API

* fix spellings

* fix test

* address comments
2020-10-14 06:42:22 -07:00
Jihoon Son ad437dd655
Add shuffle metrics for parallel indexing (#10359)
* Add shuffle metrics for parallel indexing

* javadoc and concurrency test

* concurrency

* fix javadoc

* Feature flag

* doc

* fix doc and add a test

* checkstyle

* add tests

* fix build and address comments
2020-10-10 19:35:17 -07:00
Joseph Glanville 7ce9ac4548
Fix Avro support in Web Console (#10232)
* Fix Avro OCF detection prefix and run formation detection on raw input

* Support Avro Fixed and Enum types correctly

* Check Avro version byte in format detection

* Add test for AvroOCFReader.sample

Ensures that the Sampler doesn't receive raw input that it can't
serialize into JSON.

* Document Avro type handling

* Add TS unit tests for guessInputFormat
2020-10-07 21:08:22 -07:00
Mainak Ghosh 8168e14e92
Adding task slot count metrics to Druid Overlord (#10379)
* Adding more worker metrics to Druid Overlord

* Changing the nomenclature from worker to peon as that represents the metrics that we want to monitor better

* Few more instance of worker usage replaced with peon

* Modifying the peon idle count logic to only use eligible workers available capacity

* Changing the naming to task slot count instead of peon

* Adding some unit test coverage for the new test runner apis

* Addressing Review Comments

* Modifying the TaskSlotCountStatsProvider apis so that overlords which are not leader do not emit these metrics

* Fixing the spelling issue in the docs

* Setting the annotation Nullable on the TaskSlotCountStatsProvider methods
2020-09-28 23:50:38 -07:00
Clint Wylie 1d6cb624f4
add vectorizeVirtualColumns query context parameter (#10432)
* add vectorizeVirtualColumns query context parameter

* oops

* spelling

* default to false, more docs

* fix test

* fix spelling
2020-09-28 18:48:34 -07:00
Clint Wylie b95bf444b2
add docs for kinesis lag metrics (#10435) 2020-09-28 13:13:53 -07:00
Jihoon Son 0cc9eb4903
Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided (#10288)
* Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided

* query context

* fix tests; add more test

* javadoc

* docs and more tests

* remove default and hadoop tests

* consistent name and fix javadoc

* spelling and field name

* default function for partitionsSpec

* other comments

* address comments

* fix tests and spelling

* test

* doc
2020-09-24 16:32:56 -07:00
Jonathan Wei cb30b1fe23
Automatically determine numShards for parallel ingestion hash partitioning (#10419)
* Automatically determine numShards for parallel ingestion hash partitioning

* Fix inspection, tests, coverage

* Docs and some PR comments

* Adjust locking

* Use HllSketch instead of HyperLogLogCollector

* Fix tests

* Address some PR comments

* Fix granularity bug

* Small doc fix
2020-09-24 13:47:53 -07:00
Clint Wylie dad69481f0
add light weight version of /druid/coordinator/v1/lookups/nodeStatus (#10422)
* add light weight version /druid/coordinator/v1/lookups/nodeStatus

* review stuffs
2020-09-24 14:36:53 +08:00
Maytas Monsereenusorn 72f1b55f56
Add last_compaction_state to sys.segments table (#10413)
* Add is_compacted to sys.segments table

* change is_compacted to last_compaction_state

* fix tests

* fix tests

* address comments
2020-09-23 15:29:36 -07:00
sthetland ae247b6e63
Document change in results of groupBy queries with subtotalsSpec (#10405)
* subtotalsSpec results with null values

Document the format change in results of a groupBy query with a subtotalsSpec. This update applies to 0.18 and later.

* Review catches
2020-09-19 10:51:23 -07:00
Mainak Ghosh 14072d3ab0
Adding more dimensions to the audit log entry (#10373)
* Adding more dimensions to the audit log entry

* Making adding payload in audit metric optional

* Changing the name of the parameter to includePayloadAsDimensionInMetric. Adding a unit test

* Fixing the intellij code introspection issues
2020-09-17 18:36:28 -07:00
Atul Mohan b6ad790dc7
Support combining inputsource for parallel ingestion (#10387)
* Add combining inputsource

* Fix documentation

Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2020-09-15 16:25:35 -07:00
Jihoon Son 8657b23ab2
Integration tests and docs for auto compaction with different partitioning (#10354)
* Working

* add test

* doc

* fix test

* split other integration test

* exclude other-index from other tests

* doc anchor fix

* adjust task slots and number of merge tasks

* spell check

* reduce maxNumConcurrentSubTasks to 1

* maxNumConcurrentSubtasks for range partitinoing

* reduce memory for historical

* change group name
2020-09-15 11:28:09 -07:00
Suneet Saldanha f71ba6f2c2
Vectorized ANY aggregators (#10338)
* WIP vectorized ANY aggregators

* tests

* fix aggs

* cleanup

* code review + tests

* docs

* use NilVectorSelector when needed

* fix spellcheck

* dont instantiate vectors

* cleanup
2020-09-14 19:44:58 -07:00
Abhishek Agarwal f5e2645bbb
Support SearchQueryDimFilter in sql via new methods (#10350)
* Support SearchQueryDimFilter in sql via new methods

* Contains is a reserved word

* revert unnecessary change

* Fix toDruidExpression method

* rename methods

* java docs

* Add native functions

* revert change in dockerfile

* remove changes from dockerfile

* More tests

* travis fix

* Handle null values better
2020-09-14 09:57:54 -07:00
Curt Buechter e3735602f2
Fix typo (#10385) 2020-09-11 16:31:36 -07:00
Lucas Capistrant 690e070c43
Fix doc for name of dynamic config to pause coordination (#10345) 2020-09-11 08:40:06 -05:00
Abhishek Agarwal a5c46dc84b
Add vectorization for druid-histogram extension (#10304)
* First draft

* Remove redundant code from FixedBucketsHistogramAggregator classes

* Add test cases for new classes

* Fix tests in sql compatible mode

* Typo fix

* Fix comment

* Add spelling

* Vectorize only for supported types

* Rename internal aggregator files

* Fix tests
2020-09-09 13:56:33 -07:00
LightGHLi a3bb6ee4a6
Add missing comma between JSON members in data-formats.md (#10343) 2020-09-03 20:03:06 -07:00
Gian Merlino 5cd7610fb6
SQL support for union datasources. (#10324)
* SQL support for union datasources.

Exposed via the "UNION ALL" operator. This means that there are now two
different implementations of UNION ALL: one at the top level of a query
that works by concatenating subquery results, and one at the table level
that works by creating a UnionDataSource.

The SQL documentation is updated to discuss these two use cases and how
they behave.

Future work could unify these by building support for a native datasource
that represents the union of multiple subqueries. (Today, UnionDataSource
can only represent the union of tables, not subqueries.)

* Fixes.

* Error message for sanity check.

* Additional test fixes.

* Add some error messages.
2020-08-28 07:57:06 -07:00
Gian Merlino 21703d81ac
Fix handling of 'join' on top of 'union' datasources. (#10318)
* Fix handling of 'join' on top of 'union' datasources.

The problem is that unions are typically rewritten into a series of
individual queries on the underlying tables, but this isn't done when
the union is wrapped in a join.

The main changes are in UnionQueryRunner:

1) Replace an instanceof UnionQueryRunner check with DataSourceAnalysis.
2) Replace a "query.withDataSource" call with a new function, "Queries.withBaseDataSource".

Together, these enable UnionQueryRunner to "see through" a join.

* Tests.

* Adjust heap sizes for integration tests.

* Different approach, more tests.

* Tweak.

* Styling.
2020-08-26 14:23:54 -07:00
Fernando 69d8645425
Adding supported compression formats for native batch ingestion (#10306)
* Adding supported compression formats for native batch ingestion

* Update docs/ingestion/native-batch.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* fix spellcheck

Co-authored-by: Suneet Saldanha <suneet@apache.org>
Co-authored-by: sthetland <steve.hetland@imply.io>
2020-08-26 12:39:48 -07:00
Gian Merlino 91bb27cdf7
Clarify SQL behavior for multi-value dimensions. (#10276)
There are some known inconsistencies between SQL and native that
users should be aware of.
2020-08-25 10:11:16 -07:00
frank chen 028442e75e
Redis cache extension enhancement (#10240)
* support redis cluster

* add 'password', 'database' properties

* test cases passed

* update doc

* some improvements

* fix CI

* add more test cases to improve branch coverage

* fix dependency check for test

* resolve review comments
2020-08-24 10:29:04 +08:00
Gian Merlino 0910d22f48
Add SQL "OFFSET" clause. (#10279)
* Add SQL "OFFSET" clause.

Under the hood, this uses the new offset features from #10233 (Scan)
and #10235 (GroupBy). Since Timeseries and TopN queries do not currently
have an offset feature, SQL planning will switch from one of those to
Scan or GroupBy if users add an OFFSET.

Includes a refactoring to harmonize offset and limit planning using an
OffsetLimit wrapper class. This is useful because it ensures that the
various places that need to deal with offset and limit collapsing all
behave the same way, using its "andThen" method.

* Fix test and add another test.
2020-08-21 14:11:54 -07:00
Jihoon Son b5b3e6ecce
Add maxNumFiles to splitHintSpec (#10243)
* Add maxNumFiles to splitHintSpec

* missing link

* fix build failure; use maxNumFiles for integration tests

* spelling

* lower default

* Update docs/ingestion/native-batch.md

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

* address comments; change default maxSplitSize

* spelling

* typos and doc

* same change for segments splitHintSpec

* fix build

* fix build

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2020-08-21 09:43:58 -07:00
Suneet Saldanha 0891b1f833
Add note about aggregations on floats (#10285)
* Add note about aggreations on floats

Floating point math is known to be unstable. Due to the way aggregators work
across segments it's possible for the same query operating on the same data to
produce slightly different results.

The same problem exists with any aggregators that are not commutative since
the merge order across segments is not guaranteed.

* Also talk about doubles

* Apply suggestions from code review
2020-08-17 13:29:57 -07:00
Vatsal Bajpai ee40d00be1
typo fix from hear to here (#10292)
Should be `There are no other changes that need to be made here`
2020-08-17 07:54:21 -07:00
Xavier Léauté 225490474d
Update Kafka dependencies to 2.6.0 (#10286)
* update Kafka dependencies to Kafka 2.6.0
* switch to Scala 2.13 build of Kafka
* update integration tests
* update Kafka tutorial
2020-08-15 07:56:40 -07:00
Gian Merlino 6cca7242de
Add "offset" parameter to the Scan query. (#10233)
* Add "offset" parameter to the Scan query.

It works by doing the query as normal and then throwing away the first
"offset" number of rows on the broker.

* Fix constructor call.

* Fix up JSONs.

* Fix call to ScanQuery.

* Doc update.

* Fix javadocs.

* Spotbugs, LGTM suppressions.

* Javadocs.

* Fix suppression.

* Stabilize Scan query result order, add tests.

* Update LGTM comment.

* Fixup.

* Test different batch sizes too.

* Nicer tests.

* Fix comment.
2020-08-13 14:56:24 -07:00
Clint Wylie e053348f74
add hasNulls to ColumnCapabilities, ColumnAnalysis (#10219)
* add isNullable to ColumnCapabilities, ColumnAnalysis

* better builder

* fix segment metadata queries in integration tests

* adjustments

* cleanup

* fix spotbugs

* treat unknown as true in segmentmetadata

* rename to hasNulls, add docs

* fixup

* test the dim indexer selector isNull fix for numeric columns

* fixes

* oof
2020-08-13 14:55:32 -07:00
Gian Merlino d36a0f61da
Clarify documentation on dimensions, dimensionExclusions. (#10265)
In particular: exclusions are ignored if dimensions are set.
2020-08-12 08:06:53 -07:00
Abhishek Radhakrishnan dc16abae34
Vectorization support for long, double, float min & max aggregators. (#10260)
* LongMaxVectorAggregator support and test case.

* DoubleMinVectorAggregator and test cases.

* DoubleMaxVectorAggregator and unit test.

* FloatMinVectorAggregator and FloatMaxVectorAggregator.

* Documentation update to include the other vector aggregators.

* Bug fix.

* checkstyle formatting fixes.

* CalciteQueryTest cases update.

* Separate test classes for FloatMaxAggregation and FloatMniAggregation.

* remove the cannotVectorize for float max/min aggregator in test.

* Tests in GroupByQueryRunner, GroupByTimeseriesQueryRunner and TimeseriesQueryRunner.
2020-08-10 15:18:55 -07:00
Atul Mohan 06539bc828
Set default server.maxsize to the sum of segment cache (#10255)
* Default server.maxsize

* Remove maxsize refs from config

Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2020-08-10 09:21:22 -07:00
Gian Merlino b6aaf59e8c
Add "offset" parameter to GroupBy query. (#10235)
* Add "offset" parameter to GroupBy query.

It works by doing the query as normal and then throwing away the first
"offset" number of rows on the broker.

* Stabilize GroupBy sorts.

* Fix inspections.

* Fix suppression.

* Fixups.

* Move TopNSequence to druid-core.

* Addl comments.

* NumberedElement equals verification.

* Changes from review.
2020-08-05 15:39:58 -07:00
Abhishek Radhakrishnan 34a4113752
Add vectorization support for the longMin aggregator. (#10211)
* Fix minor formatting in docs.

* Add Nullhandling initialization for test to run from IDE.

* Vectorize longMin aggregator.

- A new vectorized class for the vectorized long min aggregator.
- Changes to AggregatorFactory to support vectorize functionality.
- Few changes to schema evolution test to add LongMinAggregatorFactory.

* Add longSum to the supported vectorized aggregator implementations.

* Add MIN() long min to calcite query test that can vectorize.

* Add simple long aggregations test.

* Fixup formatting per checkstyle guide.

* fixup and add more tests for long min aggregator.

* Override test for groupBy since timestamps are handled differently.

* Null compatibility check in test.

* Review comment: Add a test case to LongMinAggregationTest.
2020-08-01 15:32:09 -07:00
frank chen 646fa84d04
Support unit on byte-related properties (#10203)
* support unit suffix on byte-related properties

* add doc

* change default value of byte-related properites in example files

* fix coding style

* fix doc

* fix CI

* suppress spelling errors

* improve code according to comments

* rename Bytes to HumanReadableBytes

* add getBytesInInt to get value safely

* improve doc

* fix problem reported by CI

* fix problem reported by CI

* resolve code review comments

* improve error message

* improve code & doc according to comments

* fix CI problem

* improve doc

* suppress spelling check errors
2020-07-31 09:58:48 +08:00
Jian Wang 271f90f205
Add segment pruning for hash based shard spec (#9810)
* Add segment pruning for hash based partitioning

* Update doc

* Add additional test

* Address comments

* Fix unit test failure

Co-authored-by: Jian Wang <jwang@pinterest.com>
2020-07-30 18:44:26 -07:00
Maytas Monsereenusorn 574b062f1f
Cluster wide default query context setting (#10208)
* Cluster wide default query context setting

* Cluster wide default query context setting

* Cluster wide default query context setting

* add docs

* fix docs

* update props

* fix checkstyle

* fix checkstyle

* fix checkstyle

* update docs

* address comments

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix NPE
2020-07-29 15:19:18 -07:00
Clint Wylie 79dffefbf8
add explicit example for jdbc query context on connection properties (#10182)
* add explicit example for jdbc query context on connection properties

* make comment clearer

* Update sql.md

* Update sql.md
2020-07-24 13:43:04 -07:00
mans2singh d4bd6e5207
ingestion and tutorial doc update (#10202) 2020-07-21 17:52:23 -07:00
Joseph Glanville f3023c6058
Fix formatting in druid-pac4j documentation (#10174)
Superfluous column broke table formatting.
2020-07-12 18:51:42 -07:00
Antoine Huret 88d20a61a6
renamed authenticationChain to authenticatorChain (#10143) 2020-07-08 19:58:21 -07:00
Gian Merlino 9587fc0b84
Fix documentation for Kinesis fetchThreads. (#10156)
* Fix documentation for Kinesis fetchThreads

The default was changed in #9819, but the documentation wasn't updated.

* Add 'procs' to spelling.
2020-07-08 19:47:09 -07:00
Jihoon Son 53a2550571
Follow-up for RetryQueryRunner fix (#10144)
* address comments; use guice instead of query context

* typo

* QueryResource tests

* address comments

* catch queryException

* fix spell check
2020-07-08 13:28:11 -07:00
Gian Merlino 11c0da8097
Add availability and consistency docs. (#10149)
* Add availability and consistency docs.

Describes transactional ingestion and atomic replacement. Also, this patch
deletes some bad advice from the javadocs for SegmentTransactionalInsertAction.

* Fix missing word.
2020-07-07 15:22:52 -07:00
Fullstop000 bcf41922ce
Remove unsupported task types in doc (#10111) 2020-07-04 18:13:53 -07:00
Atul Mohan 367eaedbb4
Clarify change in behavior for druid.server.maxSize (#10105)
* Clarify maxSize docs

* Add info about maxSize

Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2020-07-01 22:22:18 -07:00
frank chen 60c6bd5b4c
support Aliyun OSS service as deep storage (#9898)
* init commit, all tests passed

* fix format

Signed-off-by: frank chen <frank.chen021@outlook.com>

* data stored successfully

* modify config path

* add doc

* add aliyun-oss extension to project

* remove descriptor deletion code to avoid warning message output by aliyun client

* fix warnings reported by lgtm-com

* fix ci warnings

Signed-off-by: frank chen <frank.chen021@outlook.com>

* fix errors reported by intellj inspection check

Signed-off-by: frank chen <frank.chen021@outlook.com>

* fix doc spelling check

Signed-off-by: frank chen <frank.chen021@outlook.com>

* fix dependency warnings reported by ci

Signed-off-by: frank chen <frank.chen021@outlook.com>

* fix warnings reported by CI

Signed-off-by: frank chen <frank.chen021@outlook.com>

* add package configuration to support showing extension info

Signed-off-by: frank chen <frank.chen021@outlook.com>

* add IT test cases and fix bugs

Signed-off-by: frank chen <frank.chen021@outlook.com>

* 1. code review comments adopted
2. change schema from 'aliyun-oss' to 'oss'

Signed-off-by: frank chen <frank.chen021@outlook.com>

* add license info

Signed-off-by: frank chen <frank.chen021@outlook.com>

* fix doc

Signed-off-by: frank chen <frank.chen021@outlook.com>

* exclude execution of IT testcases of OSS extension from CI

Signed-off-by: frank chen <frank.chen021@outlook.com>

* put the extensions under contrib group and add to distribution

* fix names in test cases

* add unit test to cover OssInputSource

* fix names in test cases

* fix dependency problem reported by CI

Signed-off-by: frank chen <frank.chen021@outlook.com>
2020-07-01 22:20:53 -07:00
Clint Wylie c5540f46ed
fixes for ranger docs (#10109) 2020-07-01 18:26:41 -07:00
Clint Wylie 477335abb4
update links datasketches.github.io to datasketches.apache.org (#10107)
* update links datasketches.github.io to datasketches.apache.org

* now with more apache

* oops

* oops
2020-07-01 14:56:17 -07:00
Surekha d3497a6581
Filter on metrics doc (#10087)
* add note about filter on metrics to filter docs

* edit doc to include having and filtered aggregator links
2020-06-30 19:52:40 -07:00
Lee Rhodes 7b4edc93fc
Update web address to datasketches.apache.org (#10096) 2020-06-30 19:05:23 -07:00
Yuanli Han fc555980e8
Remove payload field from table sys.segment (#9883)
* remove payload field from table sys.segments

* update doc

* fix test

* fix CI failure

* add necessary fields

* fix doc

* fix comment
2020-06-29 22:20:23 -07:00
Clint Wylie 4a625751e8
Information schema doc update (#10081)
* add docs for IS_JOINABLE and IS_BROADCAST to INFORMATION_SCHEMA docs

* fixes

* oops

* revert noise

* missed one

* spellbot
2020-06-29 21:08:13 -07:00
BIGrey 69f2b1ef00
Correct the position of the double quotation in distinctcount.md file (#10094)
```
"dimensions": "[sample_dim]"
```
should be
```
"dimensions": ["sample_dim"]
```
2020-06-29 20:59:56 -07:00
Suneet Saldanha 15a0b4ffe2
Filter http requests by http method (#10085)
* Filter http requests by http method

Add a config that allows a user which http methods to allow against their
Druid server.

Druid will only accept http requests with the method: GET, PUT, POST, DELETE
and OPTIONS.
If a Druid admin wants to allow other methods, they can do so by using the
ServerConfig#allowedHttpMethods config.

If a Druid user would like to disallow OPTIONS, this can be done by changing
the AuthConfig#allowUnauthenticatedHttpOptions config

* Exclude OPTIONS from always supported HTTP methods

Add HEAD as an allowed method for web console e2e tests

* fix docs

* fix security IT

* Actually fix the web console e2e tests

* Ignore icode coverage for nitialization classes

* code review
2020-06-29 16:59:31 -07:00
Jian Wang 20fd72bd13
Fix NPE when brokers use custom priority list (#9878) 2020-06-26 17:28:54 -07:00
Maytas Monsereenusorn ec46d82c71
Add integration tests for SqlInputSource (#10080)
* Add integration tests for SqlInputSource

* make it faster
2020-06-26 10:32:42 -10:00
Jihoon Son aaee72c781
Allow append to existing datasources when dynamic partitioning is used (#10033)
* Fill in the core partition set size properly for batch ingestion with
dynamic partitioning

* incomplete javadoc

* Address comments

* fix tests

* fix json serde, add tests

* checkstyle

* Set core partition set size for hash-partitioned segments properly in
batch ingestion

* test for both parallel and single-threaded task

* unused variables

* fix test

* unused imports

* add hash/range buckets

* some test adjustment and missing json serde

* centralized partition id allocation in parallel and simple tasks

* remove string partition chunk

* revive string partition chunk

* fill numCorePartitions for hadoop

* clean up hash stuffs

* resolved todos

* javadocs

* Fix tests

* add more tests

* doc

* unused imports

* Allow append to existing datasources when dynamic partitioing is used

* fix test

* checkstyle

* checkstyle

* fix test

* fix test

* fix other tests..

* checkstyle

* hansle unknown core partitions size in overlord segment allocation

* fail to append when numCorePartitions is unknown

* log

* fix comment; rename to be more intuitive

* double append test

* cleanup complete(); add tests

* fix build

* add tests

* address comments

* checkstyle
2020-06-25 13:37:31 -07:00
Clint Wylie 0f51b3c190
fix dropwizard emitter jvm bufferpoolName metric (#10075)
* fix dropwizard emitter jvm bufferpoolName metric

* fixes
2020-06-25 12:20:25 -07:00
Maytas Monsereenusorn 9be5039f68
Enable query vectorization by default (#10065)
* Enable query vectorization by default

* update docs
2020-06-24 13:08:49 -07:00
sthetland 978b494b46
Druid user permissions (#10047)
* Druid user permissions apply in the console

* Update index.md

* noting user warning in console page; some minor shuffling

* noting user warning in console page; some minor shuffling 1

* touchups

* link checking fixes

* Updated per suggestions
2020-06-23 17:39:48 -07:00
Dylan Wylie 0470fcc9da
change default number of segment loading threads (#9856)
* change default number of segment loading threads

* fix docs

* missed file

* min -> max for segment loading threads

Co-authored-by: Dylan <dwylie@spotx.tv>
2020-06-23 13:56:44 -07:00
Jianhuan Liu 5600e1c204
fix docs error in hadoop-based part (#9907)
* fix docs error: google to azure and hdfs to http

* fix docs error: indexSpecForIntermediatePersists of tuningConfig in hadoop-based batch part

* fix docs error: logParseExceptions of tuningConfig in hadoop-based batch part

* fix docs error: maxParseExceptions of tuningConfig in hadoop-based batch part
2020-06-19 23:14:54 -10:00
Jihoon Son d644a27f1a
Create packed core partitions for hash/range-partitioned segments in native batch ingestion (#10025)
* Fill in the core partition set size properly for batch ingestion with
dynamic partitioning

* incomplete javadoc

* Address comments

* fix tests

* fix json serde, add tests

* checkstyle

* Set core partition set size for hash-partitioned segments properly in
batch ingestion

* test for both parallel and single-threaded task

* unused variables

* fix test

* unused imports

* add hash/range buckets

* some test adjustment and missing json serde

* centralized partition id allocation in parallel and simple tasks

* remove string partition chunk

* revive string partition chunk

* fill numCorePartitions for hadoop

* clean up hash stuffs

* resolved todos

* javadocs

* Fix tests

* add more tests

* doc

* unused imports
2020-06-18 18:40:43 -07:00
Suneet Saldanha b8a3223f24
Remove changes from #9114 (#10050) 2020-06-18 18:18:12 -07:00
litao a4bd144ebe
fix docs (#9114)
Co-authored-by: tomscut <tomscut@gmail.com>
2020-06-18 09:48:47 -07:00
Maytas Monsereenusorn 1a2620606d
API to verify a datasource has the latest ingested data (#9965)
* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* fix checksyle

* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* fix spelling

* address comments

* fix checkstyle

* update docs

* fix tests

* fix doc

* address comments

* fix typo

* fix spelling

* address comments

* address comments

* fix typo in docs
2020-06-16 20:48:30 -10:00
Suneet Saldanha 4e483a70b4
ROUND and having comparators correctly handle special double values (#10014)
* ROUND and having comparators correctly handle doubles

Double.NaN, Double.POSITIVE_INFINITY and Double.NEGATIVE_INFINITY are not real
numbers. Because of this, they can not be converted to BigDecimal and instead
throw a NumberFormatException.

This change adds support for calculations that produce these numbers either
for use in the `ROUND` function or the HavingSpecMetricComparator by not
attempting to convert the number to a BigDecimal.

The bug in ROUND was first introduced in #7224 where we added the ability to
round to any decimal place. This PR changes the behavior back to using
`Math.round` if we recognize a number that can not be converted to a
BigDecimal.

* Add tests and fix spellcheck

* update error message in ExpressionsTest

* Address comments

* fix up round for infinity

* round non numeric doubles returns a double

* fix spotbugs

* Update docs/misc/math-expr.md

* Update docs/querying/sql.md
2020-06-16 16:09:46 -07:00
Suneet Saldanha 0035f39e25
lpad and rpad functions match postrges behavior in SQL compatible mode (#10006)
* lpad and rpad functions deal with empty pad

Return null if the pad string used by the `lpad` and `rpad` functions is
an empty string

* Fix rpad

* Match PostgreSQL behavior in SQL compliant null handling mode

* Match PostgreSQL behavior for pad -ve len

* address review comments
2020-06-15 10:47:57 -07:00
Jonathan Wei fe2f656427
Fix broadcast rule drop and docs (#10019)
* Fix broadcast rule drop and docs

* Remove racy test check

* Don't drop non-broadcast segments on tasks, add overshadowing handling

* Don't use realtimes for overshadowing

* Fix dropping for ingestion services
2020-06-12 02:33:28 -07:00
danc 5da78d13af
Update password-provider.md (#9857) 2020-06-10 09:32:49 -07:00
Atul Mohan 17cf8ea8f2
Add Sql InputSource (#9449)
* Add Sql InputSource

* Add spelling

* Use separate DruidModule

* Change module name

* Fix docs

* Use sqltestutils for tests

* Add additional tests

* Fix inspection

* Add module test

* Fix md in docs

* Remove annotation

Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2020-06-09 12:55:20 -07:00
Lucas Capistrant 2ae1d26aa8
small fixes to configuration documentation (#9975) 2020-06-09 10:31:08 -07:00
Gian Merlino 3dfd7c30c0
Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT. (#9893)
* Add REGEXP_LIKE, fix empty-pattern bug in REGEXP_EXTRACT.

- Add REGEXP_LIKE function that returns a boolean, and is useful in
  WHERE clauses.
- Fix REGEXP_EXTRACT return type (should be nullable; causes incorrect
  filter elision).
- Fix REGEXP_EXTRACT behavior for empty patterns: should always match
  (previously, they threw errors).
- Improve error behavior when REGEXP_EXTRACT and REGEXP_LIKE are passed
  non-literal patterns.
- Improve documentation of REGEXP_EXTRACT.

* Changes based on PR review.

* Fix arg check.

* Important fixes!

* Add speller.

* wip

* Additional tests.

* Fix up tests.

* Add validation error tests.

* Additional tests.

* Remove useless call.
2020-06-03 14:31:37 -07:00
Maytas Monsereenusorn 0d22462e07
Document unsupported Join on multi-value column (#9948)
* Document Unsupported Join on multi-value column

* Document Unsupported Join on multi-value column

* address comments

* Add unit tests

* address comments

* add tests
2020-06-03 09:55:52 -10:00
sthetland a33705f0e3
Querying doc refresh tutorial (#9879)
* Update tutorial-query.md

* First full pass complete

* Smoothing over, a bit

* link and spell checking

* Update querying.md

* Review comments; screenshot fixes

* Making ports consistent, pending confirmation 

Switching to the Router port, to make this be consistent with the tutorial ports, but can switch back here and there if it should be 8082 instead.

* Resizing screenshot

* Update querying.md

* Review feedback incorporated.
2020-05-29 14:32:21 -07:00
Surekha ff551ae412
Modify information schema doc to specify correct value of TABLE_CATALOG (#9950) 2020-05-29 10:10:28 -07:00
Maytas Monsereenusorn 6130a834c2
Update doc on tmp dir (java.io.tmpdir) best practice (#9910)
* Update doc on tmp dir best practice

* remove local recommendation
2020-05-26 09:37:01 -07:00
frank chen b91d50044e
add some details to the build doc (#9885)
* update initial build command

* add some details for building

* fix spelling check errors

* fix spelling check warnings

Signed-off-by: frank chen <frank.chen021@outlook.com>
2020-05-21 12:35:54 -07:00
Jianhuan Liu 2050f2b00a
fix docs error: google to azure and hdfs to http (#9881) 2020-05-20 10:17:39 -07:00
Joseph Glanville 793f386d6a
Add support for Avro OCF using InputFormat (#9671)
* Add AvroOCFInputFormat

* Support supplying a reader schema in AvroOCFInputFormat

* Add docs for Avro OCF input format

* Address review comments

* Address second round of review
2020-05-16 14:09:12 -07:00
Maytas Monsereenusorn 0a8bf83bc5
Bad plan for table-lookup-lookup join with filter on first lookup and outer limit (#9773)
* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit

* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit

* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit

* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit

* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit

* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit

* address comments

* address comments

* fix checkstyle

* address comments

* address comments
2020-05-14 16:56:40 -07:00
awelsh93 6f25a84d2e
Add TaskCountStatsMonitor to config docs (#9447) 2020-05-11 14:08:46 -07:00
sthetland ce03f31a73
Clarifying workerThreads and a few other nits (#9804)
* Update data-formats.md

Per Suneet, "Since you're editing this file can you also fix the json on line 177 please - it's missing a comma after the }"

* Light text cleanup

* Removing discussion of sample data, since it's repeated in the data loading tutorial, and not immediately relevant here.

* Clarifying accepted values for URI lookup

* Update index.md

* original quickstart full first pass

* original quickstart full first pass

* first pass all the way through

* straggler

* image touchups and finished old tutorial

* a bit of finishing up

* druid-caffeine-cache ext previously removed

* Sample MaxDirectMemorySize value unrealistic

* Review comments

* fixing links

* spell checking gymnastics

* workerThreads desc slightly expanded

* typo

* Typo

* Reversing Kafka config order

* Changing order of configs for Kinesis

* Trying this again: ioConfig then tuningConfig
2020-05-06 09:05:18 -07:00
Alexander Saydakov 844d626738
added number of bins parameter (#9436)
* added number of bins parameter

* addressed review points

* test equals

Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>
2020-05-04 16:53:09 -07:00
Jian Wang 85dfbb64cb
Update documention for metricCompression (#9811) 2020-05-03 12:56:48 -07:00
sthetland c61365c1e0
Druid Quickstart refactor and update (#9766)
* Update data-formats.md

Per Suneet, "Since you're editing this file can you also fix the json on line 177 please - it's missing a comma after the }"

* Light text cleanup

* Removing discussion of sample data, since it's repeated in the data loading tutorial, and not immediately relevant here.

* Update index.md

* original quickstart full first pass

* original quickstart full first pass

* first pass all the way through

* straggler

* image touchups and finished old tutorial

* a bit of finishing up

* Review comments

* fixing links

* spell checking gymnastics
2020-04-30 12:07:28 -07:00
Aleksei Chumagin 0642f778fa
changed Preview to Apply (#9757) 2020-04-29 09:53:25 -07:00
James Dalton b279e04a31
table fix (#9769) 2020-04-28 11:23:24 -07:00
Francesco Nidito e7e41e3a36
Adding support for autoscaling in GCE (#8987)
* Adding support for autoscaling in GCE

* adding extra google deps also in gce pom

* fix link in doc

* remove unused deps

* adding terms to spelling file

* version in pom 0.17.0-incubating-SNAPSHOT --> 0.18.0-SNAPSHOT

* GCEXyz -> GceXyz in naming for consistency

* add preconditions

* add VisibleForTesting annotation

* typos in comments

* use StringUtils.format instead of String.format

* use custom exception instead of exit

* factorize interval time between retries

* making literal value a constant

* iter all network interfaces

* use provided on google (non api) deps

* adding missing dep

* removing unneded this and use Objects methods instead o 3-way if in hash and comparison

* adding import

* adding retries around getRunningInstances and adding limit for operation end waiting

* refactor GceEnvironmentConfig.hashCode

* 0.18.0-SNAPSHOT -> 0.19.0-SNAPSHOT

* removing unused config

* adding tests to hash and equals

* adding nullable to waitForOperationEnd

* adding testTerminate

* adding unit tests for createComputeService

* increasing retries in unrelated integration-test to prevent sporadic failure (hopefully)

* reverting queryResponseTemplate change

* adding comment for Compute.Builder.build() returning null
2020-04-28 03:13:39 -07:00
Gian Merlino 4087a015e8
Datasource doc structure adjustments. (#9716)
- Reorder both the datasource and query-execution page orderings to
table, lookup, union, inline, query, join. (Roughly increasing order
of conceptual "fanciness".)
- Add more crosslinks from datasource page to query-execution page:
one per datasource type.
2020-04-23 16:04:59 -07:00
Clint Wylie e677c62484
document useFilterCNF query context parameter (#9647)
* document useFilterCNF query context parameter

* move context key to QueryContexts

* Update .spelling
2020-04-16 22:12:20 -07:00
Clint Wylie b89ad49396
disable group by config applyLimitPushDownToSegment by default (#9711)
* disable group by config applyLimitPushDownToSegment by default

* document
2020-04-16 03:03:35 -07:00
Gian Merlino 42590ae64b
Refresh query docs. (#9704)
* Refresh query docs.

Larger changes:

- New doc: querying/datasource.md describes the various kinds of
datasources you can use, and has examples for both SQL and native.
- New doc: querying/query-execution.md describes how native queries
are executed at a high level. It doesn't go into the details of specific
query engines or how queries run at a per-segment level. But I think it
would be good to add or link that content here in the future.
- Refreshed doc: querying/sql.md updated to refer to joins, reformatted
a bit, added a new "Query translation" section that explains how
queries are translated from SQL to native, and removed configuration
details (moved to configuration/index.md).
- Refreshed doc: querying/joins.md updated to refer to join datasources.

Smaller changes:

- Add helpful banners to the top of query documentation pages telling
people whether a given page describes SQL, native, or both.
- Add SQL metrics to operations/metrics.md.
- Add some color and cross-links in various places.
- Add native query component docs to the sidebar, and renamed them so
they look nicer.
- Remove Select query from the sidebar.
- Fix Broker SQL configs in configuration/index.md. Remove them from
querying/sql.md.
- Combined querying/searchquery.md and querying/searchqueryspec.md.

* Updates.

* Fix numbering.

* Fix glitches.

* Add new words to spellcheck file.

* Assorted changes.

* Further adjustments.

* Add missing punctuation.
2020-04-15 16:12:20 -07:00
Maytas Monsereenusorn 8328d91b30
Add missing integration tests for the compaction by the coordinator (#9644)
* Add API to trigger a compaction by the coordinator for integration tests

* Add missing integration tests for the compaction by the coordinator

* address comments
2020-04-15 14:27:33 -07:00
Will Salisbury cda9f41e69
s/S3/GCS/g (#9700)
fix typo [ at least I hope this was a typo… ]
2020-04-14 18:39:54 -07:00
Himanshu ca369e5768
druid-pac4j: add ability to use custom ssl trust store while talking to auth server (#9637)
* druid-pac4j: add ability for custom ssl trust store for talking to auth
server

* fix nimbusds DefaultResourceRetriever name in comment
2020-04-10 18:01:59 -07:00
bolkedebruin ab5ac7f890
Document possible vulnerabilities for the druid-ranger-security (#9649)
* Document possible vulnerabilities for the druid-ranger-security

In certain configurations the ranger plugin can expose vulnerabilities due
to some of its dependencies having CVEs.

* Spelling checker is a bit tight
2020-04-09 10:43:11 -07:00
bolkedebruin 2d99966933
Add Apache Ranger Authorization (#9579) 2020-04-04 18:02:24 +02:00
Maytas Monsereenusorn 1852bf33ea
Add Integration Test for functionality of kinesis ingestion (#9576)
* kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* fix kinesis timeout

* Kinesis IT

* Kinesis IT

* fix checkstyle

* Kinesis IT

* address comments

* fix checkstyle
2020-04-03 09:45:22 -07:00
Neil Volungis 0ac875a8b4
Update docker.md readme to note memory requirements (#9529)
* Update docker.md readme to note memory requirements

* Fix grammatical error

Co-Authored-By: Suneet Saldanha <44787917+suneet-s@users.noreply.github.com>

Co-authored-by: Suneet Saldanha <44787917+suneet-s@users.noreply.github.com>
2020-03-24 03:33:29 -07:00
Clint Wylie bf85ea19b2
roaring bitmaps by default (#9548)
* it is finally time

* fix it

* more docs

* fix doc
2020-03-23 18:15:57 -07:00
Himanshu 5604ac7963
druid extension for OpenID Connect auth using pac4j lib (#8992)
* druid pac4j security extension for OpenID Connect OAuth 2.0 authentication

* update version in druid-pac4j pom

* introducing unauthorized resource filter

* authenticated but authorized /unified-webconsole.html

* use httpReq.getRequestURI() for matching callback path

* add documentation

* minor doc addition

* licesne file updates

* make dependency analyze succeed

* fix doc build

* hopefully fixes doc build

* hopefully fixes license check build

* yet another try on fixing license build

* revert unintentional changes to website folder

* update version to 0.18.0-SNAPSHOT

* check session and its expiry on each request

* add crypto service

* code for encrypting the cookie

* update doc with cookiePassphrase

* update license yaml

* make sessionstore in Pac4jFilter private non static

* make Pac4jFilter fields final

* okta: use sha256 for hmac

* remove incubating

* add UTs for crypto util and session store impl

* use standard charsets

* add license header

* remove unused file

* add org.objenesis.objenesis to license.yaml

* a bit of nit changes  in CryptoService  and embedding EncryptionResult for clarity

* rename alg  to cipherAlgName

* take cipher alg name, mode and padding as input

* add java doc  for CryptoService  and make it more understandable

* another  UT for CryptoService

* cache pac4j Config

* use generics clearly in Pac4jSessionStore

* update cookiePassphrase doc to mention PasswordProvider

* mark stuff Nullable where appropriate in Pac4jSessionStore

* update doc to mention jdbc

* add error log on reaching callback resource

* javadoc  for Pac4jCallbackResource

* introduce NOOP_HTTP_ACTION_ADAPTER

* add correct module name in license file

* correct extensions folder name in licenses.yaml

* replace druid-kubernetes-extensions to druid-pac4j

* cache SecureRandom instance

* rename UnauthorizedResourceFilter to AuthenticationOnlyResourceFilter
2020-03-23 18:15:45 -07:00
Clint Wylie d8833316c4
fix broken links (#9537)
* fix broken links

* missing /

* adjustment
2020-03-22 17:41:18 -07:00
Gian Merlino 54c9325256
SQL support for joins on subqueries. (#9545)
* SQL support for joins on subqueries.

Changes to SQL module:

- DruidJoinRule: Allow joins on subqueries (left/right are no longer
  required to be scans or mappings).
- DruidJoinRel: Add cost estimation code for joins on subqueries.
- DruidSemiJoinRule, DruidSemiJoinRel: Removed, since DruidJoinRule can
  handle this case now.
- DruidRel: Remove Nullable annotation from toDruidQuery, because
  it is no longer needed (it was used by DruidSemiJoinRel).
- Update Rules constants to reflect new rules available in our current
  version of Calcite. Some of these are useful for optimizing joins on
  subqueries.
- Rework cost estimation to be in terms of cost per row, and place all
  relevant constants in CostEstimates.

Other changes:

- RowBasedColumnSelectorFactory: Don't set hasMultipleValues. The lack
  of isComplete is enough to let callers know that columns might have
  multiple values, and explicitly setting it to true causes
  ExpressionSelectors to think it definitely has multiple values, and
  treat the inputs as arrays. This behavior interfered with some of the
  new tests that involved queries on lookups.
- QueryContexts: Add maxSubqueryRows parameter, and use it in druid-sql
  tests.

* Fixes for tests.

* Adjustments.
2020-03-22 16:43:55 -07:00
Clint Wylie 68013fbc64
fix issue where total limit was being applied even when not configured (#9534)
* fix issue where total limit was being applied even when not configured

* fix inspection

* add reserved lane name check to manual laning strategy
2020-03-18 18:05:59 -07:00
Chi Cao Minh e7b3dd9cd1
Update to mysql connector 5.1.48 (#9514) 2020-03-16 10:38:31 -07:00
Clint Wylie 69af760a19
add manual laning strategy, integration test (#9492)
* add manual laning strategy, integration test, json config test

* share percent conversion method

* wrong assert

* review stuffs

* doc adjustments

* more tests

* test adjustment

* adjust docs

* Update index.md
2020-03-13 20:06:55 -07:00
Clint Wylie 6afd55c8f4
threshold based automatic query prioritization (#9493)
* threshold based automatic query prioritization

* fixes

* spelling and fixes

* fix docs

* spelling

* checkstyle

* adjustments

* doc fix
2020-03-13 01:41:54 -07:00
Chi Cao Minh 6b02991464
Match GREATEST/LEAST function behavior to other DBs (#9488)
* Match GREATEST/LEAST function behavior

Change the behavior of the GREATEST / LEAST functions to be similar to
how it is implemented in other databases (as functions instead of
aggregators). The GREATEST/LEAST functions are not in the SQL standard,
but users will expect behavior similar to what other databases provide.

* Match postgres behavior & handle more SQL types

* Fix imports
2020-03-12 15:10:11 -07:00
Maytas Monsereenusorn e9888f41cb
Modify check java version script to indicate experimental support for Java 11 (#9455)
* Modify check java version script to indicate experimental support for Java 11

* update docs
2020-03-11 09:22:39 -07:00
Himanshu 75a5591448
remove old unused zookeeper dependent lookups code (#9480)
* remove old unused zookeeper dependent lookups code

* make  intellij inspector happy
2020-03-10 12:12:48 -07:00
Clint Wylie 8b9fe6f584
query laning and load shedding (#9407)
* prototype

* merge QueryScheduler and QueryManager

* everything in its right place

* adjustments

* docs

* fixes

* doc fixes

* use resilience4j instead of semaphore

* more tests

* simplify

* checkstyle

* spelling

* oops heh

* remove unused

* simplify

* concurrency tests

* add SqlResource tests, refactor error response

* add json config tests

* use LongAdder instead of AtomicLong

* remove test only stuffs from scheduler

* javadocs, etc

* style

* partial review stuffs

* adjust

* review stuffs

* more javadoc

* error response documentation

* spelling

* preserve user specified lane for NoSchedulingStrategy

* more test, why not

* doc adjustment

* style

* missed review for make a thing a constant

* fixes and tests

* fix test

* Update docs/configuration/index.md

Co-Authored-By: sthetland <steve.hetland@imply.io>

* doc update

Co-authored-by: sthetland <steve.hetland@imply.io>
2020-03-10 02:57:16 -07:00
Jihoon Son 75e2051195
Convert array_contains() and array_overlaps() into native filters if possible (#9487)
* Convert array_contains() and array_overlaps() into native filters if
possible

* make spotbugs happy and fix null results when null compatible
2020-03-09 22:50:38 -07:00
Maytas Monsereenusorn 814f5a9717
add password provider reference to s3 optional cred docs (#9439) 2020-03-09 17:56:42 -07:00
Julian Jaffe eda03630d0
Add OnHeapMemorySegmentWriteOutMediumFactory (#9454)
* Add OnHeapMemorySegmentWriteOutMediumFactory

Add a factory for OnHeapMemorySegmentWriteOutMedium to support direct writing via Spark.

* Register OnHeapMemorySegmentWriteOutMediumFactory.

Register OnHeapMemorySegmentWriteOutMediumFactory with SegmentWriteOutMediumFactory.

* Remove unnecessary throws

The base `makeSegmentWriteOutMedium` throws an IOException, but the particular implementation of OnHeapMemorySegmentWriteOutMediumFactory does not throw a checked exception.

* Update SegmentWriteOutMedium docs to include onHeapMemory

Update the SegmentWriteOutMedium section of the indexing docs to include a description of the new OnHeapSegmentMediumWriteOut option.
2020-03-05 22:34:08 -08:00
Jihoon Son 3016057178
Make Transform an ExtensionPoint (#9319)
* Make Transform an ExtensionPoint

* Add transform to the list of documented extensions

* Add example transform implementation
2020-03-04 12:13:14 -08:00
Jihoon Son 9466ac7c9b
Skip empty files for local, hdfs, and cloud input sources (#9450)
* Skip empty files for local, hdfs, and cloud input sources

* split hint spec doc

* doc for skipping empty files

* fix typo; adjust tests

* unnecessary fluent iterable

* address comments

* fix test

* use the right lists

* fix test

* fix test
2020-03-03 20:51:06 -08:00
Gian Merlino c9faf3e148
Add SQL GROUPING SETS support. (#9122)
* Add SQL GROUPING SETS support.

Built on top of the subtotalsSpec feature in the groupBy query. This also involves
two changes to subtotalsSpec:

- Alter behavior so limitSpec is applied after subtotalsSpec, rather than applied to
  each grouping set. This is more in line with SQL standard behavior. I think it is okay
  to make this change, since the old behavior was not documented, so users should
  hopefully not be depending on it.
- Fix a bug where virtual columns were included in the subtotal queries, but they
  should not have been.

Also fixes two bugs in query equality checking:

- BaseQuery: Use getDuration() instead of "duration" in equals and hashCode, since the
  latter is lazily initialized and might be null in one query but not the other.
- GroupByQuery: Include subtotalsSpec in equals and hashCode.

* Fix bugs.

* Fix tests.

* PR updates.

* Grouping class hygiene.
2020-02-26 08:52:39 -08:00
Maytas Monsereenusorn 92fb83726b
Add support for optional aws credentials for s3 for ingestion (#9375)
* Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

* Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

* Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

* fix build failure

* fix failing build

* fix failing build

* Code cleanup

* fix failing test

* Removed CloudConfigProperties and make specific class for each cloudInputSource

* Removed CloudConfigProperties and make specific class for each cloudInputSource

* pass s3ConfigProperties for split

* lazy init s3client

* update docs

* fix docs check

* address comments

* add ServerSideEncryptingAmazonS3.Builder

* fix failing checkstyle

* fix typo

* wrap the ServerSideEncryptingAmazonS3.Builder in a provider

* added java docs for S3InputSource constructor

* added java docs for S3InputSource constructor

* remove wrap the ServerSideEncryptingAmazonS3.Builder in a provider
2020-02-25 20:59:53 -08:00
zachjsh d771b42ed1
Move Azure extension into Core (#9394)
* Move Azure extension into Core

Moving the azure extension into Core.

* * Fix build failure

* * Add The MIT License (MIT) to list of compatible licenses

* * Address review comments

* * change reference to contrib azure to core azure

* * Fix spelling mistakes.
2020-02-25 17:49:16 -08:00
als-sdin f619903403
Updated the configuration documentation on coordinator kill tasks to clarify whether they delete only unused segments. (#9400) 2020-02-25 13:15:55 -08:00
Chi Cao Minh 7fc99ee206
Add common optional dependencies for extensions (#9399)
* Add common optional dependencies for extensions

Include hadoop-aws and postgres JDBC connector jar to improve
out-of-the-box experience for extensions. The mysql JDBC connector jar
is not bundled as it is GPL.

* Update docs

* Fix typo
2020-02-25 00:04:00 -08:00
Jihoon Son 3bc7ae782c
Create splits of multiple files for parallel indexing (#9360)
* Create splits of multiple files for parallel indexing

* fix wrong import and npe in test

* use the single file split in tests

* rename

* import order

* Remove specific local input source

* Update docs/ingestion/native-batch.md

Co-Authored-By: sthetland <steve.hetland@imply.io>

* Update docs/ingestion/native-batch.md

Co-Authored-By: sthetland <steve.hetland@imply.io>

* doc and error msg

* fix build

* fix a test and address comments

Co-authored-by: sthetland <steve.hetland@imply.io>
2020-02-24 17:34:39 -08:00
Clint Wylie 6d8dd5ec10
string -> expression -> string -> expression (#9367)
* add Expr.stringify which produces parseable expression strings, parser support for null values in arrays, and parser support for empty numeric arrays

* oops, macros are expressions too

* style

* spotbugs

* qualified type arrays

* review stuffs

* simplify grammar

* more permissive array parsing

* reuse expr joiner

* fix it
2020-02-21 15:43:02 -08:00
zachjsh f707064bed
Add Azure config options for segment prefix and max listing length (#9356)
* Add Azure config options for segment prefix and max listing length

Added configuration options to allow the user to specify the prefix
within the segment container to store the segment files. Also
added a configuration option to allow the user to specify the
maximum number of input files to stream for each iteration.

* * Fix test failures

* * Address review comments

* * add dependency explicitly to pom

* * update docs

* * Address review comments

* * Address review comments
2020-02-21 14:12:03 -08:00
Jihoon Son 141d8dd875
Enable druid.coordinator.kill.pendingSegments.on by default (#9385)
* Enable druid.coordinator.kill.pendingSegments.on by default

* checkstyle
2020-02-21 13:13:49 -08:00
Björn Zettergren 30c24df4d3
Add config option for namespacePrefix (#9372)
* Add config option for namespacePrefix

opentsdb emitter sends metric names to opentsdb verbatim as what druid
names them, for example "query.count", this doesn't fit well with a
central opentsdb server which might have namespaced metrics, for example
"druid.query.count". This adds support for adding an optional prefix.

The prefix also gets a trailing dot (.), after it, so the metric name
becomes <namespacePrefix>.<metricname>

configureable as "druid.emitter.opentsdb.namespacePrefix", as
documented.

Co-authored-by: Martin Gerholm <martin.gerholm@deltaprojects.com>
Signed-off-by: Martin Gerholm <martin.gerholm@deltaprojects.com>
Signed-off-by: Björn Zettergren <bjorn.zettergren@deltaprojects.com>

* Spelling for PR #9372

Added "namespacePrefix" to .spelling exceptions, it's a variable name
used in documentation for opentsdb-emitter.

* fixing tests for PR #9372

changed naming of variables to be more descriptive
added test of prefix being an empty string: "".
added a conditional to buildNamespacePrefix to check for empty string
being fed if EventConverter called without OpentsdbEmitterConfig
instance.

* fixing checkstyle errors for PR #9372

used == to compare literal string, should be equals()

* cleaned up and updated PR #9372

Created a buildMetric function as suggested by clintropolis, and
removed redundant tests for empty strings as they're only used when
calling EventConverter directly without going through
OpentsdbEmitterConfig.

* consistent naming of tests PR #9372

Changed names of tests in files to match better with what it was
actually testing

changed check for Strings.isNullOrEmpty to just check for `null`, as
empty string valued `namespacePrefix` is handled in
OpentsdbEmitterConfig.

Co-authored-by: Martin Gerholm <inspector-martin@users.noreply.github.com>
2020-02-20 14:01:41 -08:00
Clint Wylie b408a6d774
sql support for dynamic parameters (#6974)
* sql support for dynamic parameters

* fixup

* javadocs

* fixup from merge

* formatting

* fixes

* fix it

* doc fix

* remove druid fallback self-join parameterized test

* unused imports

* ignore test for now

* fix imports

* fixup

* fix merge

* merge fixup

* fix test that cannot vectorize

* fixup and more better

* dependency thingo

* fix docs

* tweaks

* fix docs

* spelling

* unused imports after merge

* review stuffs

* add comment

* add ignore text

* review stuffs
2020-02-19 13:09:20 -08:00
Clint Wylie 2e54755a03
add docker tutorial, friendlier docker-compose.yml, experimental java 11 dockerfile (#9262)
* add docker tutorial, experimental java 11 dockerfile

* fix typo

* spelling

* doc adjustments
2020-02-13 21:24:45 -08:00
Maytas Monsereenusorn c30579e47b
ANY Aggregator should not skip null values implementation (#9317)
* ANY Aggregator should not skip null values implementation

* add tests

* add more tests

* Update documentation

* add more tests

* address review comments

* optimize StringAnyBufferAggregator

* fix failing tests

* address pr comments
2020-02-12 14:01:41 -08:00
Clint Wylie c3ebb5eb65
variance aggregator support for double columns (#9076)
* variance aggregator support for double column instead of casting to float

* docs

* everything in its right place

* checkstyle

* adjustments
2020-02-12 09:32:42 -08:00
Dusan Maric ebd199da73
docs: extensions-core: mysql: fix MySQL connector library Maven Central URL (#9344) 2020-02-10 21:54:46 -08:00
Atul Mohan 7968524b01
Add Pig-specific file handling to Avro parser (#9258)
* Add processing for data files from AvroStorage

* Add words to spellings file
2020-02-10 21:53:11 -08:00
Clint Wylie b55657cc26
fix protobuf extension packaging and docs (#9320)
* fix protobuf extension packaging and docs

* fix paths

* Update protobuf.md

* Update protobuf.md
2020-02-07 09:26:52 -08:00
Lucas Capistrant 2e1dbe598c
Create new dynamic config to pause coordinator helpers when needed (#9224)
* Create new dynamic config to pause coordinator helpers when needed

* Fix spelling mistakes flagged in Travis build

* Add an integration test for coordinator pause dynamic config

* Improve documentation for new dynamic coordinator config and remove un-needed info logs in favor of debug

* address naming convention of 'deep store' vs 'deep storage' in new configs doc line

* Fix newline at end of configuration index.md

* Last try to resolve newline issue in configuration readme

* fix spell checks from travis build

* Fix another flagges spelling error from Travis
2020-02-05 15:33:42 -08:00
Aditya 868fdeb384
GREATEST/LEAST post-aggregators in SQL (#8719)
* implement shell for greatest sql aggregator with hardcoded long values

* implement functional long greatest aggregator for direct access columns

* implement greatest & least sql aggregators for long & double types using abstract base class

* add javadocs, unit tests & handling for floats for greatest/least postaggregations

* minor checkstyle fix

* improve naming for the test cases

* make inner class static

* remove blank lines to retest travis build

* change trivial text to rerun travis build

* implement suggested updates for greatest/least sql aggs & fix checkstyle issues

* fix stale comments in greatest/least sql aggs abstract base

* Update sql.md

* improve sql function definitions for greatest/least sql aggs

* add more tests for greatest/least sql aggs

* add tests to cover invalid greatest/least sql expressions

* rename & reorder greatest least sql tests
2020-02-04 17:08:53 -08:00
sthetland 556a3861ed
Make docs on reset supervisor operation scarier (#9288)
* Update kafka-ingestion.md

Companion doc update to #9253, intended to make a supervisor reset scarier

* Update kinesis-ingestion.md
2020-02-04 15:30:31 -08:00
Roman Leventov b9186f8f9f Reconcile terminology and method naming to 'used/unused segments'; Rename MetadataSegmentManager to MetadataSegmentsManager (#7306)
* Reconcile terminology and method naming to 'used/unused segments'; Don't use terms 'enable/disable data source'; Rename MetadataSegmentManager to MetadataSegments; Make REST API methods which mark segments as used/unused to return server error instead of an empty response in case of error

* Fix brace

* Import order

* Rename withKillDataSourceWhitelist to withSpecificDataSourcesToKill

* Fix tests

* Fix tests by adding proper methods without interval parameters to IndexerMetadataStorageCoordinator instead of hacking with Intervals.ETERNITY

* More aligned names of DruidCoordinatorHelpers, rename several CoordinatorDynamicConfig parameters

* Rename ClientCompactTaskQuery to ClientCompactionTaskQuery for consistency with CompactionTask; ClientCompactQueryTuningConfig to ClientCompactionTaskQueryTuningConfig

* More variable and method renames

* Rename MetadataSegments to SegmentsMetadata

* Javadoc update

* Simplify SegmentsMetadata.getUnusedSegmentIntervals(), more javadocs

* Update Javadoc of VersionedIntervalTimeline.iterateAllObjects()

* Reorder imports

* Rename SegmentsMetadata.tryMark... methods to mark... and make them to return boolean and the numbers of segments changed and relay exceptions to callers

* Complete merge

* Add CollectionUtils.newTreeSet(); Refactor DruidCoordinatorRuntimeParams creation in tests

* Remove MetadataSegmentManager

* Rename millisLagSinceCoordinatorBecomesLeaderBeforeCanMarkAsUnusedOvershadowedSegments to leadingTimeMillisBeforeCanMarkAsUnusedOvershadowedSegments

* Fix tests, refactor DruidCluster creation in tests into DruidClusterBuilder

* Fix inspections

* Fix SQLMetadataSegmentManagerEmptyTest and rename it to SqlSegmentsMetadataEmptyTest

* Rename SegmentsAndMetadata to SegmentsAndCommitMetadata to reduce the similarity with SegmentsMetadata; Rename some methods

* Rename DruidCoordinatorHelper to CoordinatorDuty, refactor DruidCoordinator

* Unused import

* Optimize imports

* Rename IndexerSQLMetadataStorageCoordinator.getDataSourceMetadata() to retrieveDataSourceMetadata()

* Unused import

* Update terminology in datasource-view.tsx

* Fix label in datasource-view.spec.tsx.snap

* Fix lint errors in datasource-view.tsx

* Doc improvements

* Another attempt to please TSLint

* Another attempt to please TSLint

* Style fixes

* Fix IndexerSQLMetadataStorageCoordinator.createUsedSegmentsSqlQueryForIntervals() (wrong merge)

* Try to fix docs build issue

* Javadoc and spelling fixes

* Rename SegmentsMetadata to SegmentsMetadataManager, address other comments

* Address more comments
2020-01-27 11:24:29 -08:00
Zhenxiao Luo 479c09751c Add MostAvailableSizeStorageLocationSelectorStrategy (#8879)
* Add MostAvailableSize LocationSelectorStrategy

* Add doc for mostAvailableSize strategy

* Fix docs for mostAvailableSize
2020-01-23 13:42:03 -08:00
sthetland 83ddc8de1e Update data-formats.md (#9238)
* Update data-formats.md

Field error and light rewording of new Avro material (and working through the doc authoring process).

* Update data-formats.md

Make default statements consistent. Future change: s/=/is.
2020-01-22 15:00:53 -08:00
Clint Wylie 8011211a0c first/last aggregators and nulls (#9161)
* null handling for numeric first/last aggregators, refactor to not extend nullable numeric agg since they are complex typed aggs

* initially null or not based on config

* review stuff, make string first/last consistent with null handling of numeric columns, more tests

* docs

* handle nil selectors, revert to primitive first/last types so groupby v1 works...
2020-01-20 11:51:54 -08:00
Suneet Saldanha 180c622e0f Minor doc updates (#9217)
* update string first last aggs

* update kafka ingestion specs in docs

* remove unnecessary parser spec
2020-01-20 11:34:37 -08:00
Gian Merlino d21054f7c5
Remove the deprecated interval-chunking stuff. (#9216)
* Remove the deprecated interval-chunking stuff.

See https://github.com/apache/druid/pull/6591, https://github.com/apache/druid/pull/4004#issuecomment-284171911 for details.

* Remove unused import.

* Remove chunkInterval too.
2020-01-19 17:14:23 -08:00
Suneet Saldanha 93167188ea Update docs for extensions (#9218)
* Update docs for s3 and avro extensions

* More doc updates - google + cleanup
2020-01-19 12:49:33 -08:00
Jihoon Son 153495068b Doc update for the new input source and the new input format (#9171)
* Doc update for new input source and input format.

- The input source and input format are promoted in all docs under docs/ingestion
- All input sources including core extension ones are located in docs/ingestion/native-batch.md
- All input formats and parsers including core extension ones are localted in docs/ingestion/data-formats.md
- New behavior of the parallel task with different partitionsSpecs are documented in docs/ingestion/native-batch.md

* parquet

* add warning for range partitioning with sequential mode

* hdfs + s3, gs

* add fs impl for gs

* address comments

* address comments

* gcs
2020-01-17 15:52:05 -08:00
singh 936b9bdfd0 add deets about the keyfile (#9209) 2020-01-17 11:24:49 -08:00
Maytas Monsereenusorn 42359c93dd Implement ANY aggregator (#9187)
* Implement ANY aggregator

* Add copyright headers

* Add unit tests

* fix BufferAggregator

* Fix bug in BufferAggregator

* hook up the SQL command

* add check for buffer aggregator

* Address comment

* address comments

* add docs

* Address comments

* add more tests for numeric columns that have null values when run in sql compatible null mode

* fix checkstyle errors

* fix failing tests

* fix failing tests
2020-01-16 14:40:32 -08:00
Suneet Saldanha 92ac22d060 Link javaOpts to middlemanager runtime.properties docs (#9101)
* Link javaOpts to middlemanager runtime.properties docs

* fix broken link

* reword config links
2020-01-15 21:22:49 -08:00
Suneet Saldanha 85a3d416b0 Tutorials use new ingestion spec where possible (#9155)
* Tutorials use new ingestion spec where possible

There are 2 main changes
  * Use task type index_parallel instead of index
  * Remove the use of parser + firehose in favor of inputFormat + inputSource

index_parallel is the preferred method starting in 0.17. Setting the job to
index_parallel with the default maxNumConcurrentSubTasks(1) is the equivalent
of an index task

Instead of using a parserSpec, dimensionSpec and timestampSpec have been
promoted to the dataSchema. The format is described in the ioConfig as the
inputFormat.

There are a few cases where the new format is not supported
 * Hadoop must use firehoses instead of the inputSource and inputFormat
 * There is no equivalent of a combining firehose as an inputSource
 * A Combining firehose does not support index_parallel

* fix typo
2020-01-15 14:08:29 -08:00
Jonathan Wei d1500c1328 Update Kinesis resharding information about task failures (#9104) 2020-01-07 15:44:48 -08:00
Jonathan Wei 58d337186b
Graduation update for ASF release process guide and download links (#9126)
* Graduation update for ASF release process guide and download links

* Fix release vote thread typo

* Fix pom.xml
2020-01-06 15:00:33 -06:00
Jonathan Wei aa539177ec De-incubation cleanup in code, docs, packaging (#9108)
* De-incubation cleanup in code, docs, packaging

* remove unused docs script
2020-01-03 12:33:19 -05:00
Jihoon Son 3c31493772 Add missing docs for http client configurations (#9054)
* Add missing docs for http client configurations

* fix typo

* backticks
2019-12-19 17:41:04 -08:00
Chi Cao Minh 6178f05da6 Fail superbatch range partition multi dim values (#9058)
* Fail superbatch range partition multi dim values

Change the behavior of parallel indexing range partitioning to fail
ingestion if any row had multiple values for the partition dimension.
After this change, the behavior matches that of hadoop indexing.
(Previously, rows with multiple dimension values would be skipped.)

* Improve err msg, rename method, rename test class
2019-12-18 10:14:03 -08:00
Clint Wylie 6881535b48
docs - clarify cache parameters (#9020) 2019-12-13 16:53:45 -08:00
Suneet Saldanha 3325da1718 Allow startup scripts to specify java home (#9021)
* Allow startup scripts to specify java home

The startup scripts now look for java in 3 locations. The order is from
most related to druid to least, ie
    ${DRUID_JAVA_HOME}
    ${JAVA_HOME}
    ${PATH}

* Update fn names and clean up code

* final round of fixes

* fix spellcheck
2019-12-12 21:36:00 -08:00
Himanshu 9236dd9467
optionally enable Jetty ForwardedRequestCustomizer (#9010)
* optionally enable Jetty ForwardedRequestCustomizer

* fix doc build
2019-12-12 17:00:08 -08:00
Benjamin Hopp 13c33c1766 Update architecture.md (#9015) 2019-12-11 19:05:50 -08:00
Jihoon Son e5e1e9c4ee
Fix broken master (#9005)
* Multibinding for NodeRole

* Fix endpoints

* fix doc

* fix test
2019-12-11 15:56:36 -08:00
Parag Jain 24fe824055 add readiness endpoints to processes having initialization delays (#8841) 2019-12-10 17:26:13 -08:00
Chi Cao Minh 3de7ab8523 DataSketches jars in core (#9003)
Having DataSketches jars in core will allow potential improvements, for
example:
- Provide an alternative implementation of HLL:
  https://datasketches.github.io/docs/HLL/HllSketchVsDruidHyperLogLogCollector.html
- Range partitioning for native parallel batch indexing without having
  the user load extensions on the classpath

Dev mailing list discussion:
https://lists.apache.org/thread.html/301410d71ff799cf616bf17c4ebcf9999fc30829f5fa62909f403e6c%40%3Cdev.druid.apache.org%3E
2019-12-10 14:02:34 -08:00
Chi Cao Minh bab78fc80e Parallel indexing single dim partitions (#8925)
* Parallel indexing single dim partitions

Implements single dimension range partitioning for native parallel batch
indexing as described in #8769. This initial version requires the
druid-datasketches extension to be loaded.

The algorithm has 5 phases that are orchestrated by the supervisor in
`ParallelIndexSupervisorTask#runRangePartitionMultiPhaseParallel()`.
These phases and the main classes involved are described below:

1) In parallel, determine the distribution of dimension values for each
   input source split.

   `PartialDimensionDistributionTask` uses `StringSketch` to generate
   the approximate distribution of dimension values for each input
   source split. If the rows are ungrouped,
   `PartialDimensionDistributionTask.UngroupedRowDimensionValueFilter`
   uses a Bloom filter to skip rows that would be grouped. The final
   distribution is sent back to the supervisor via
   `DimensionDistributionReport`.

2) The range partitions are determined.

   In `ParallelIndexSupervisorTask#determineAllRangePartitions()`, the
   supervisor uses `StringSketchMerger` to merge the individual
   `StringSketch`es created in the preceding phase. The merged sketch is
   then used to create the range partitions.

3) In parallel, generate partial range-partitioned segments.

   `PartialRangeSegmentGenerateTask` uses the range partitions
   determined in the preceding phase and
   `RangePartitionCachingLocalSegmentAllocator` to generate
   `SingleDimensionShardSpec`s.  The partition information is sent back
   to the supervisor via `GeneratedGenericPartitionsReport`.

4) The partial range segments are grouped.

   In `ParallelIndexSupervisorTask#groupGenericPartitionLocationsPerPartition()`,
   the supervisor creates the `PartialGenericSegmentMergeIOConfig`s
   necessary for the next phase.

5) In parallel, merge partial range-partitioned segments.

   `PartialGenericSegmentMergeTask` uses `GenericPartitionLocation` to
   retrieve the partial range-partitioned segments generated earlier and
   then merges and publishes them.

* Fix dependencies & forbidden apis

* Fixes for integration test

* Address review comments

* Fix docs, strict compile, sketch check, rollup check

* Fix first shard spec, partition serde, single subtask

* Fix first partition check in test

* Misc rewording/refactoring to address code review

* Fix doc link

* Split batch index integration test

* Do not run parallel-batch-index twice

* Adjust last partition

* Split ITParallelIndexTest to reduce runtime

* Rename test class

* Allow null values in range partitions

* Indicate which phase failed

* Improve asserts in tests
2019-12-09 23:05:49 -08:00
Vadim Ogievetsky 0330744793 Docs: bold Java 8 requirement (#8996)
* bold Java 8 req

* add warning box
2019-12-09 20:23:07 -08:00
Roman Leventov 1c62987783
Add SelfDiscoveryResource; rename org.apache.druid.discovery.No… (#6702)
* Add SelfDiscoveryResource

* Rename org.apache.druid.discovery.NodeType to NodeRole. Refactor CuratorDruidNodeDiscoveryProvider. Make SelfDiscoveryResource to listen to updates only about a single node (itself).

* Extended docs

* Fix brace

* Remove redundant throws in Lifecycle.Handler.stop()

* Import order

* Remove unresolvable link

* Address comments

* tmp

* tmp

* Rollback docker changes

* Remove extra .sh files

* Move filter

* Fix SecurityResourceFilterTest
2019-12-08 18:47:58 +03:00
Clint Wylie 441515cb50 update dump-segment docs so example command works (#8998)
* update dump-segment docs so example command works

* not everyone uses bash
2019-12-07 06:36:46 -08:00
Jonathan Wei c949a25210
Add DruidInputSource (replacement for IngestSegmentFirehose) (#8982)
* Add Druid input source and format

* Inherit dims/metrics from segment

* Add ingest segment firehose reindexing test

* Remove unnecessary module

* Fix unit tests, checkstyle

* Add doc entry

* Fix dimensionExclusions handling, add parallel index integration test

* Add spelling exclusion

* Address some PR comments

* Checkstyle

* wip

* Address rest of PR comments

* Address PR comments
2019-12-05 16:50:00 -08:00
Clint Wylie 5ecdf94d83
add 'prefixes' support to google input source (#8930)
* add prefixes support to google input source, making it symmetrical-ish with s3

* docs

* more better, and tests

* unused

* formatting

* javadoc

* dependencies

* oops

* review comments

* better javadoc
2019-12-04 21:01:10 -08:00
Lucas Capistrant 8dd9a8cb15 Small doc fix for baseTaskDir conf (#8978) 2019-12-04 14:07:03 -08:00
Clint Wylie a48784a1fd dropwizard-emitter doc fixes (#8988) 2019-12-04 12:52:58 -08:00
Fangyuan Deng 187cf0dd3f [Improvement] historical fast restart by lazy load columns metadata(20X faster) (#6988)
* historical fast restart by lazy load columns metadata

* delete repeated code

* add documentation for druid.segmentCache.lazyLoadOnStart

* fix unit test fail

* fix spellcheck

* update docs

* update docs mentioning a catch
2019-12-03 09:47:01 -08:00
Jonathan Wei 00ce18a0ea
Additional Kinesis resharding fixes (#8870)
* Additional Kinesis resharding fixes

* Address PR comments

* Remove unused method

* Adjust SegmentTransactionalInsertAction null handling

* Check for unchanged metadata on empty publish

* Add logs for empty publish

* Fix javadoc

* Clear offset when invalid endOffsets are seen

* Fix LGTM alert

* Fix build

* Add resharding note to Kinesis docs

* Checkstyle

* Spelling

* Address PR comments

* Checkstyle
2019-11-28 12:59:01 -08:00
Clint Wylie 4458113375
S3 input source (#8903)
* add s3 input source for native batch ingestion

* add docs

* fixes

* checkstyle

* lazy splits

* fixes and hella tests

* fix it

* re-use better iterator

* use key

* javadoc and checkstyle

* exception

* oops

* refactor to use S3Coords instead of URI

* remove unused code, add retrying stream to handle s3 stream

* remove unused parameter

* update to latest master

* use list of objects instead of object

* serde test

* refactor and such

* now with the ability to compile

* fix signature and javadocs

* fix conflicts yet again, fix S3 uri stuffs

* more tests, enforce uri for bucket

* javadoc

* oops

* abstract class instead of interface

* null or empty

* better error
2019-11-25 22:31:19 -08:00
Jihoon Son a2e6de4b16 Fix the potential race between SplittableInputSource.getNumSplits() and SplittableInputSource.createSplits() in TaskMonitor (#8924)
* Fix the potential race SplittableInputSource.getNumSplits() and SplittableInputSource.createSplits() in TaskMonitor

* Fix docs and javadoc

* Add unit tests for large or small estimated num splits

* add override
2019-11-23 01:38:08 -08:00
Clint Wylie 7250010388 add parquet support to native batch (#8883)
* add parquet support to native batch

* cleanup

* implement toJson for sampler support

* better binaryAsString test

* docs

* i hate spellcheck

* refactor toMap conversion so can be shared through flattenerMaker, default impls should be good enough for orc+avro, fixup for merge with latest

* add comment, fix some stuff

* adjustments

* fix accident

* tweaks
2019-11-22 10:49:16 -08:00
SeKing 9955107e8e RandomLocationSelectorStrategy to Choose an available disk(location) to store a segment. With unit tests. (#8461) 2019-11-22 03:46:54 -08:00
Surekha d628bebbd7 Make supervisor API similar to submit task API (#8810)
* accept spec or dataSchema, tuningConfig, ioConfig while submitting task json

* fix test

* update docs

* lgtm warning

* Add original constructor back to IndexTask to minimize changes

* fix indentation in docs

* Allow spec to be specified in supervisor schema

* undo IndexTask spec changes

* update docs

* Add Nullable and deprecated annotations

* remove deprecated configs from SeekableStreamSupervisorSpec

* remove nullable annotation
2019-11-20 10:04:41 -08:00
Clint Wylie d67c3c7aed document SQL compatible null handling mode (#8894)
* document SQL compatible null handling mode

* adjustments

* fix docs

* review changes
2019-11-20 06:52:20 -08:00
Clint Wylie 074a45219d add google cloud storage InputSource for native batch (#8907)
* add google cloud storage InputSource for native batch

* rename

* checkstyle

* fix

* fix spelling

* review comments
2019-11-19 19:49:43 -08:00
Chi Cao Minh 8365bdf62a Address security vulnerabilities (#8878)
* Address security vulnerabilities

Security vulnerabilities addressed by upgrading 3rd party libs:

- Upgrade avro-ipc to 1.9.1
  - sonatype-2019-0115
- Upgrade caffeine to 2.8.0
  - sonatype-2019-0282
- Upgrade commons-beanutils to 1.9.4
  - CVE-2014-0114
- Upgrade commons-codec to 1.13
  - sonatype-2012-0050
- Upgrade commons-compress to 1.19
  - CVE-2019-12402
  - sonatype-2018-0293
- Upgrade hadoop-common to 2.8.5
  - CVE-2018-11767
- Upgrade hadoop-mapreduce-client-core to 2.8.5
  - CVE-2017-3166
- Upgrade hibernate-validator to 5.2.5
  - CVE-2017-7536
- Upgrade httpclient to 4.5.10
  - sonatype-2017-0359
- Upgrade icu4j to 55.1
  - CVE-2014-8147
- Upgrade jackson-databind to 2.6.7.3:
  - CVE-2017-7525
- Upgrade jetty-http to 9.4.12:
  - CVE-2017-7657
  - CVE-2017-7658
  - CVE-2017-7656
  - CVE-2018-12545
- Upgrade log4j-core to 2.8.2
  - CVE-2017-5645:
- Upgrade netty to 3.10.6
  - CVE-2015-2156
- Upgrade netty-common to 4.1.42
  - CVE-2019-9518
- Upgrade netty-codec-http to 4.1.42
  - CVE-2019-16869
- Upgrade nimbus-jose-jwt to 4.41.1
  - CVE-2017-12972
  - CVE-2017-12974
- Upgrade plexus-utils to 3.0.24
  - CVE-2017-1000487
  - sonatype-2015-0173
  - sonatype-2016-0398
- Upgrade postgresql to 42.2.8
  - CVE-2018-10936

Note that if users are using JDBC lookups with postgres, they may need
to update the JDBC jar used by the lookup extension.

* Fix license for postgresql
2019-11-19 09:14:33 -08:00
Chi Cao Minh d60978343a Improve missing JDBC driver error for lookups (#8872)
If the JDBC drivers are missing from the lookup extensions, throw an
exception that directs the user how to resolve the issue. This change is
a follow up to #8825.
2019-11-18 11:42:38 -08:00
Jihoon Son 1611792855
Add InputSource and InputFormat interfaces (#8823)
* Add InputSource and InputFormat interfaces

* revert orc dependency

* fix dimension exclusions and failing unit tests

* fix tests

* fix test

* fix test

* fix firehose and inputSource for parallel indexing task

* fix tc

* fix tc: remove unused method

* Formattable

* add needsFormat(); renamed to ObjectSource; pass metricsName for reader

* address comments

* fix closing resource

* fix checkstyle

* fix tests

* remove verify from csv

* Revert "remove verify from csv"

This reverts commit 1ea7758489.

* address comments

* fix import order and javadoc

* flatMap

* sampleLine

* Add IntermediateRowParsingReader

* Address comments

* move csv reader test

* remove test for verify

* adjust comments

* Fix InputEntityIteratingReader

* rename source -> entity

* address comments
2019-11-15 09:22:09 -08:00
Clint Wylie cc54b2a9df support for array expressions in TransformSpec with ExpressionTransform (#8744)
* transformSpec + array expressions

changes:
* added array expression support to transformSpec
* removed ParseSpec.verify since its only use afaict was preventing transform expr that did not replace their input from functioning
* hijacked index task test to test changes

* remove docs about being unsupported

* re-arrange test assert

* unused imports

* imports

* fix tests

* preserve types

* suppress warning, fixes, add test

* formatting

* cleanup

* better list to array type conversion and tests

* fix oops
2019-11-13 11:04:37 -08:00
fst0 80dbf44fca Add reference to druid.storage.type (#8857)
* Add reference to `druid.storage.type`

This should be in here. Without setting storage type to S3 globally it will obviously not be used, even if all other parameters are correct.

* Update s3.md

Add global storage parameter to knob table.

* Update s3.md
2019-11-13 10:03:41 -08:00
Lucas Capistrant a066cc5648 Fix groupMapping endpoint URIs in druid-basic-security doc (#8847) 2019-11-12 21:12:34 +05:30
Jonathan Wei 75ea0d592a Add more datasketches doubles sketch SQL functions (#8843)
* Add more datasketches doubles sketch SQL postaggs

* style and lgtm
2019-11-08 18:05:06 -08:00
Gian Merlino 0e8c3f74d0 SQL: EARLIEST, LATEST aggregators. (#8815)
* SQL: EARLIEST, LATEST aggregators.

I chose these names instead of FIRST, LAST because those are already
reserved functions in Calcite that mean something different. I think
these are also better names anyway.

* Finalify.

* SQL updates.

* Adjust aggregator calls.

* Validations, test updates.

* Review docs.
2019-11-08 16:29:25 -08:00
Clint Wylie 7aafcf8bca parallel broker merges on fork join pool (#8578)
* sketch of broker parallel merges done in small batches on fork join pool

* fix non-terminating sequences, auto compute parallelism

* adjust benches

* adjust benchmarks

* now hella more faster, fixed dumb

* fix

* remove comments

* log.info for debug

* javadoc

* safer block for sequence to yielder conversion

* refactor LifecycleForkJoinPool into LifecycleForkJoinPoolProvider which wraps a ForkJoinPool

* smooth yield rate adjustment, more logs to help tune

* cleanup, less logs

* error handling, bug fixes, on by default, more parallel, more tests

* remove unused var

* comments

* timeboundary mergeFn

* simplify, more javadoc

* formatting

* pushdown config

* use nanos consistently, move logs back to debug level, bit more javadoc

* static terminal result batch

* javadoc for nullability of createMergeFn

* cleanup

* oops

* fix race, add docs

* spelling, remove todo, add unhandled exception log

* cleanup, revert unintended change

* another unintended change

* review stuff

* add ParallelMergeCombiningSequenceBenchmark, fixes

* hyper-threading is the enemy

* fix initial start delay, lol

* parallelism computer now balances partition sizes to partition counts using sqrt of sequence count instead of sequence count by 2

* fix those important style issues with the benchmarks code

* lazy sequence creation for benchmarks

* more benchmark comments

* stable sequence generation time

* update defaults to use 100ms target time, 4096 batch size, 16384 initial yield, also update user docs

* add jmh thread based benchmarks, cleanup some stuff

* oops

* style

* add spread to jmh thread benchmark start range, more comments to benchmarks parameters and purpose

* retool benchmark to allow modeling more typical heterogenous heavy workloads

* spelling

* fix

* refactor benchmarks

* formatting

* docs

* add maxThreadStartDelay parameter to threaded benchmark

* why does catch need to be on its own line but else doesnt
2019-11-07 11:58:46 -08:00
Jad Naous ce3c0dae4d Add note on JDBC libs for lookups (#8825)
* Add note on JDBC libs for lookups

* Fix directory and additional "the"
2019-11-06 13:31:26 -08:00
Himanshu 5adc8212b4
add documentation for druid docker and k8s operator (#8802)
* add documentation for druid docker and k8s operator

* address review comment and add Kubernetes to spelling file
2019-11-06 12:56:21 -08:00
Tijo Thomas 27acdbd2b8 'hadoop fs' command is deprecated . The new approach is to use hdfs command . Replacing 'hadoop fs' command with 'hdfs dfs' (#8762) 2019-11-01 04:42:10 +05:30
Giuseppe Martino 9c171e2b1f Message rejection absolute date (#8656)
* Add option lateMessageRejectionStartDate

* Use option lateMessageRejectionStartDate

* Fix tests

* Add lateMessageRejectionStartDate to kafka indexing service

* Update tests kafka indexing service

* Fix tests for KafkaSupervisorTest

* Add lateMessageRejectionStartDate to KinesisSupervisorIOConfig

* Fix var name

* Update documentation

* Add check lateMessageRejectionStartDateTime and lateMessageRejectionPeriod, fails if both were specified.
2019-10-31 15:13:02 -07:00
Clint Wylie 3ff5e02237 remove select query (#8739)
* remove select query

* thanks teamcity

* oops

* oops

* add back a SelectQuery class that throws RuntimeExceptions linking to docs

* adjust text

* update docs per review

* deprecated
2019-10-30 19:29:56 -07:00
Gian Merlino 7605c23354 Remove Tranquility configs and certain doc references. (#8793)
Since it hasn't received updates or community interest in a while, it makes sense
to de-emphasize it in the distribution and most documentation (outside of simple
mentions of its existence).
2019-10-30 16:30:16 -07:00
Gian Merlino c922d2c3c9 Use bundled ZooKeeper in tutorials. (#8792) 2019-10-30 16:17:28 -07:00
Gian Merlino aa81253cf4 Fix typos. (#8767) 2019-10-28 12:47:01 -07:00
Gian Merlino b65d2ac648 Add HDFS firehose (#8754)
* Add HDFS firehose.

* Tests, support for lists of paths.

* Fixups.

* Update list of firehoses.

* Wildcards is a word.
2019-10-28 08:07:38 -07:00
Vadim Ogievetsky f9b94a5db1
Docs: remove self link (#8760)
This section links to itself in the description. I tried to follow that link and spit hot tea all over my monitor from laughter.
2019-10-27 22:33:22 -07:00
Clint Wylie 09f92818d4 update druid expression docs to indicate that array functions do not work at indexing time (#8734)
* update druid expression docs to indicate that array functions are not supported in transformSpec

* fix unrelated spelling check
2019-10-24 22:04:08 -07:00
Eyal Yurman 14e33428f0 Moving Average extention: Add Sum averagers (#8511)
* Add sum averagers.

* avoid casting double to long.
2019-10-24 16:37:24 -07:00
Vadim Ogievetsky cc3650ee3b fix doc headers (#8729) 2019-10-24 11:17:39 -07:00
Jihoon Son f5b9bf5525 Cluster-wide configuration for query vectorization (#8657)
* Cluster-wide configuration for query vectorization

* add doc

* fix build

* fix doc

* rename to QueryConfig and add javadoc

* fix checkstyle

* fix variable names
2019-10-23 21:44:28 +08:00
David Glasser b453fda251 docs: clarify native batch ingestion w/ overlapping segments (#8720)
I was confused by a paragraph in the docs that I myself wrote!
2019-10-22 21:01:56 -07:00
Jad Naous 2ab43aa688 Update tutorial-kerberos-hadoop.md (#8689)
* Update tutorial-kerberos-hadoop.md

Fix up what looks like a bad merge.

* Update tutorial-kerberos-hadoop.md

Fix spelling issues
2019-10-22 14:40:41 -07:00