12864 Commits

Author SHA1 Message Date
George Shiqi Wu
76e70654ac
Fix issues when startup timeout is hit (#14425) 2023-06-14 11:49:55 -07:00
Vadim Ogievetsky
6fd28fc185
Web console: split the Ingestion view into two views: Supervisors and Tasks (#14395)
* init split

* don't crash if unable to get running tasks

* update snapshots

* push down state into call

* googies

* simplify

* update e2e tests

* feedback fixes

* update e2e tests

* better icons

* fix test

* adjust colors
2023-06-14 10:42:30 -07:00
Clint Wylie
8454cc619a
auto columns fixes (#14422)
changes:
* auto columns no longer participate in generic 'null column' handling, this was a mistake to try to support and caused ingestion failures due to mismatched ColumnFormat, and will be replaced in the future with nested common format constant column functionality (not in this PR)
* fix bugs with auto columns which contain empty objects, empty arrays, or primitive types mixed with either of these empty constructs
* fix bug with bound filter when upper is null equivalent but is strict
2023-06-14 08:57:06 -07:00
Abhishek Radhakrishnan
be5a6593a9
Reset RuntimeInfo to fix flaky test ParametrizedUriEmitterConfigTest. (#14405)
* Add injector so JVM settings are correctly set up and bound for the test.

* Add VisibleForTesting IDE annotation.

* spacing
2023-06-13 18:07:51 -07:00
Abhishek Radhakrishnan
b8495d45a1
Expose Druid functions in INFORMATION_SCHEMA.ROUTINES table. (#14378)
* Add INFORMATION_SCHEMA.ROUTINES to expose Druid operators and functions.

* checkstyle

* remove IS_DETERMISITIC.

* test

* cleanup test

* remove logs and simplify

* fixup unit test

* Add docs for INFORMATION_SCHEMA.ROUTINES table.

* Update test and add another SQL query.

* add stuff to .spelling and checkstyle fix.

* Add more tests for custom operators.

* checkstyle and comment.

* Some naming cleanup.

* Add FUNCTION_ID

* The different Calcite function syntax enums get translated to FUNCTION

* Update docs.

* Cleanup markdown table.

* fixup test.

* fixup intellij inspection

* Review comment: nullable column; add a function to determine function syntax.

* More tests; add non-function syntax operators.

* More unit tests. Also add a separate test for DruidOperatorTable.

* actually just validate non-zero count.

* switch up the order

* checkstyle fixes.
2023-06-13 15:44:04 -04:00
Clint Wylie
61120dc49a
fix Kafka input format to throw ParseException if timestamp is missing (#14413) 2023-06-13 09:00:11 -07:00
Rishabh Singh
66c3cc1391
Handle unparseable SupervisorSpec in metadata store (#14382)
Changes:
- Skip a supervisor spec entry which cannot be deserialised into a `SupervisorSpec` object.
- Log an error for the unparseable spec
2023-06-13 08:02:01 +05:30
Abhishek Radhakrishnan
1c76ebad3b
Minor doc updates. (#14409)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2023-06-12 15:24:48 -07:00
Abhishek Radhakrishnan
326f2c5020
Add more statement attributes to explain plan result. (#14391)
This PR adds the following to the ATTRIBUTES column in the explain plan output:
- partitionedBy
- clusteredBy
- replaceTimeChunks

This PR leverages the work done in #14074, which added a new column ATTRIBUTES
to encapsulate all the statement-related attributes.
2023-06-12 19:18:02 +05:30
Rishabh Singh
8b212e73d7
Add method to authorize native query using authentication result (#14376) 2023-06-12 11:06:00 +05:30
Clint Wylie
b5f45832b1
Add 'Flaky test' issue template (#14394)
* Add 'Flaky test' issue template

* Update flaky_test.md
2023-06-11 19:02:38 -07:00
Adarsh Sanjeev
267cbac6ff
Add logs for deleting files using storage connector (#14350)
* Add logs for deleting files using storage connector

* Address review comments

* Update log message format
2023-06-11 21:24:30 +05:30
Kashif Faraz
6e158704cb
Do not retry INSERT task into metadata if max_allowed_packet limit is violated (#14271)
Changes
- Add a `DruidException` which contains a user-facing error message, HTTP response code
- Make `EntryExistsException` extend `DruidException`
- If metadata store max_allowed_packet limit is violated while inserting a new task, throw
`DruidException` with response code 400 (bad request) to prevent retries
- Add `SQLMetadataConnector.isRootCausePacketTooBigException` with impl for MySQL
2023-06-10 12:15:44 +05:30
Abhishek Radhakrishnan
31c386ee1b
Fixup typo and java code snippets in JDBC docs. (#14399) 2023-06-09 12:39:21 -07:00
John Gozde
4d146ca87d
Upgrades the React dependency to v18 (#14380)
* Use react 18

* Remove deprecated usage of Toaster

* Make AppToaster lazy

* Update testing-library, snapshots

* Licenses

* Document lazy-init, add license header
2023-06-09 12:09:13 -07:00
Abhishek Radhakrishnan
23c2dcaf8d
Add NullHandling module initialization for LookupDimensionSpecTest (#14393) 2023-06-09 09:07:32 +05:30
Benedict Jin
5eb2556566
Add workflow links to README to jump into detailed pages (#14383) 2023-06-09 08:31:50 +05:30
imply-cheddar
87149d5975
Remove AbstractIndex (#14388)
The class apparently only exists to add a toString()
method to Indexes, which basically just crashes any debugger
on any meaningfully sized index.  It's a pointless
abstract class that basically only causes pain.
2023-06-08 19:52:16 -07:00
317brian
ff577a69a5
doc: escape tags in markdown in prepration for docusaurus2 (#14379)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2023-06-08 11:26:18 -07:00
Kashif Faraz
12e8fa5c97
Prevent coordinator from getting stuck if leadership changes during coordinator run (#14385)
Changes:
- Add a timeout of 1 minute to resultFuture.get() in `CostBalancerStrategy.chooseBestServer`.
1 minute is the typical time for a full coordinator run and is more than enough time for cost
computations of a single segment.
- Raise an alert if an exception is encountered while computing costs and if the executor has
not been shutdown. This is because a shutdown is intentional and does not require an alert.
2023-06-08 15:29:20 +05:30
Atul Mohan
6a4cbab4b8
Upgrade parquet-mr version (#14070)
* Upgrade parquet version

* Move parquet version to hadoop3

* Fix license

* Exclude audience annotations
2023-06-07 08:54:54 -07:00
Gian Merlino
6370769cbf
Fix documentation for druid.query.scheduler.numThreads. (#14381)
* Fix documentation for druid.query.scheduler.numThreads.
2023-06-07 14:48:08 +05:30
Soumyava
01b22ca022
Hll Sketch and Theta sketch estimate can now be used as an expression (#14312)
* Hll Sketch estimate can now be used as an expression
* Theta sketch estimate now can be used as an expression
2023-06-06 20:14:25 -07:00
Abhishek Radhakrishnan
2d258a95ad
Fix EARLIEST_BY/LATEST_BY signature and include function name in signature. (#14352)
* Fix EarliestLatestBySqlAggregator signature; Include function name for all signatures.

* Single quote function signatures, space between args and remove \n.

* fixup UT assertion
2023-06-06 09:41:05 -07:00
Laksh Singla
5da601c47e
fix npe (#14369) 2023-06-06 17:01:42 +05:30
John Gozde
cfc2a8d286
Switch to @blueprint/datetime2 (#14371)
* Bump blueprint packages

* Switch to datetime2 components

* Update licenses

* Update snapshots
2023-06-05 22:18:05 -07:00
Gian Merlino
a0d49baad6
MSQ: Fix issue with rollup ingestion and aggregators with multiple names. (#14367)
The same aggregator can have two output names for a SQL like:

  INSERT INTO foo
  SELECT x, COUNT(*) AS y, COUNT(*) AS z
  FROM t
  GROUP BY 1
  PARTITIONED BY ALL

In this case, the SQL planner will create a query with a single "count"
aggregator mapped to output names "y" and "z". The prior MSQ code did
not properly handle this case, instead throwing an error like:

  Expected single output for query column[a0] but got [[1, 2]]
2023-06-06 10:28:41 +05:30
John Gozde
c14e54cf93
Remove context params from class component ctors (#14366) 2023-06-05 11:15:28 -07:00
317brian
49c056af17
docs: add basic contributor guide for docs (#14365)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-06-05 10:53:17 -07:00
Tejaswini Bandlamudi
8e4f003f02
Fix flaky Revised ITs failures on GHA runners (#14348)
* Fix read timed out failures and remove containers before test

* remove containers before loading images

* add labels to IT docker containers, download stable minio docker image release instead of latest
2023-06-05 18:58:54 +05:30
Abhishek Agarwal
139156cf6b
Reduce the spam in broker logs (#14368) 2023-06-05 18:56:34 +05:30
Katya Macedo
7fd215b2e7
Document storeCompactionState (#14354) 2023-06-02 11:09:04 -07:00
Harini Rajendran
4ff6026d30
Adding SegmentMetadataEvent and publishing them via KafkaEmitter (#14281)
In this PR, we are enhancing KafkaEmitter, to emit metadata about published segments (SegmentMetadataEvent) into a Kafka topic. This segment metadata information that gets published into Kafka, can be used by any other downstream services to query Druid intelligently based on the segments published. The segment metadata gets published into kafka topic in json string format similar to other events.
2023-06-02 21:28:26 +05:30
Andreas Maechler
45014bd5b4
Handle all types of exceptions when initializing input source in sampler API (#14355)
The sampler API returns a `400 bad request` response if it encounters a `SamplerException`.
Otherwise, it returns a generic `500 Internal server error` response, with the message
"The RuntimeException could not be mapped to a response, re-throwing to the HTTP container".

This commit updates `RecordSupplierInputSource` to handle all types of exceptions instead of just
`InterruptedException`and wrap them in a `SamplerException` so that the actual error is
propagated back to the user.
2023-06-02 19:43:53 +05:30
Abhishek Agarwal
b482fda503
Ignore misc.xml (#14362) 2023-06-02 12:00:52 +05:30
Andreas Maechler
55effd92cf
Docs: Typo and language cleanup in Kinesis ingestion docs (#14356)
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2023-06-02 08:18:41 +05:30
317brian
70952c0977
docs: add sql array functions to nav (#14361)
* docs: add sql array functions to nav

* fix typo

* add sql array functions to list

* fix spelling errors
2023-06-01 16:45:27 -07:00
zachjsh
04a82da63d
Input source security fixes (#14266)
It was found that several supported tasks / input sources did not have implementations for the methods used by the input source security feature, causing these tasks and input sources to fail when used with this feature. This pr adds the needed missing implementations. Also securing the sampling endpoint with input source security, when enabled.
2023-06-01 16:37:19 -07:00
zachjsh
e75fb8e8e3
Account for data format and compression in MSQ auto taskAssignment (#14307)
### Description

This change allows for consideration of the input format and compression  when computing how to split the input files among available tasks, in MSQ ingestion, when considering the value of the  `maxInputBytesPerWorker` query context parameter. This query parameter allows users to control the maximum number of bytes, with granularity of input file / object, that ingestion tasks will be assigned to ingest. With this change, this context parameter now denotes the estimated weighted size in bytes of the input to split on, with consideration for input format and compression format, rather than the actual file size, reported by the file system.  We assume uncompressed newline delimited json as a baseline, with scaling factor of `1`. This means that when computing the byte weight that a file has towards the input splitting, we take the file size as is, if uncompressed json, 1:1. It was found during testing that gzip compressed json, and parquet, has scale factors of `4` and `8` respectively, meaning that each byte of data is weighted 4x and 8x respectively, when computing input splits. This weighted byte scaling is only considered for MSQ ingestion that uses either LocalInputSource or CloudObjectInputSource at the moment. The default value of the `maxInputBytesPerWorker` query context parameter has been updated from 10 GiB, to 512 MiB
2023-06-01 12:53:49 -07:00
Abhishek Radhakrishnan
d60290e76d
Remove extraneous apostrophe in the native batch docs (#14358) 2023-06-01 08:57:41 -07:00
Katya Macedo
2da84de87f
docs: remove the note about segments (#14161) 2023-05-31 16:37:19 -07:00
317brian
2012a6bd8e
Docs: fix broken link to Python API jupyter notebook (#14332) 2023-05-31 08:12:27 +05:30
Charles Smith
37cb76d545
fixes dataSourceName varaible ref (#14340) 2023-05-30 13:15:27 -07:00
panhongan
c244c3de53
fix hdfs initialization issue (#14276)
* fix hdfs initialization issue

* add PR

* remove conf settings

* Improve comments

* move hdfs storage validation to start handler

* restore exception
2023-05-30 12:41:54 -07:00
Nhi Pham
70c06fc0e1
Advise against using WEEK granularity for Native Batch and MSQ (#14341)
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-05-30 11:40:12 -07:00
Rishabh Singh
2086ff88bc
Add logging for task stop operations (#14192)
Log more details when task cannot be stopped for various reasons
2023-05-30 18:50:52 +05:30
Pramod Immaneni
1ac5544da7
Updated default value of maxTotalRows to reflect the value in the code (#14298) 2023-05-30 14:41:06 +05:30
Abhishek Radhakrishnan
5fd3e01ef0
More specific exclusions in the examples folder. (#14347)
This PR changes how we skip java UT and ITs with changes in the examples folder. After this change, any Markdown files within the examples folder and jupyter-notebooks directory will be excluded. The rationale behind these more specific exclusions is that some ITs use json files checked in examples, so we want to trigger the full workflow for all other changes.
2023-05-30 12:01:45 +05:30
Kashif Faraz
d4cacebf79
Add tests for CostBalancerStrategy (#14230)
Changes:
- `CostBalancerStrategyTest`
  - Focus on verification of cost computations rather than choosing servers in this test
  - Add new tests `testComputeCost` and `testJointSegmentsCost`
  - Add tests to demonstrate that with a long enough interval gap, all costs become negligible
  - Retain `testIntervalCost` and `testIntervalCostAdditivity`
  - Remove redundant tests such as `testStrategyMultiThreaded`, `testStrategySingleThreaded`as
verification of this behaviour is better suited to `BalancingStrategiesTest`.
- `CostBalancerStrategyBenchmark`
  - Remove usage of static method from `CostBalancerStrategyTest`
  - Explicitly setup cluster and segments to use for benchmarking
2023-05-30 08:52:56 +05:30
Kashif Faraz
8091c6a547
Update default values in CoordinatorDynamicConfig (#14269)
The defaults of the following config values in the `CoordinatorDynamicConfig` are being updated.

1. `maxSegmentsInNodeLoadingQueue = 500` (previous = 100)
2. `replicationThrottleLimit = 500` (previous = 10)
Rationale: With round-robin segment assignment now being the default assignment technique,
the Coordinator can assign a large number of under-replicated/unavailable segments very quickly,
without getting stuck in `RunRules` duty due to very slow strategy-based cost computations.

3. `maxSegmentsToMove = 100` (previous = 5)
Rationale: A very low value (say 5) is ineffective in balancing especially if there are many segments
to balance. A very large value can cause excessive moves, which has these disadvantages:
- Load of moving segments competing with load of unavailable/under-replicated segments
- Unnecessary network costs due to constant download and delete of segments

These defaults will be revisited after #13197 is merged.
2023-05-30 08:51:33 +05:30