11810 Commits

Author SHA1 Message Date
William Hyun
2aadd69f54
Update ORC to 1.7.5 (#12667) 2022-06-24 16:08:42 -07:00
Gian Merlino
d5abd06b96
Fix flaky KafkaIndexTaskTest. (#12657)
* Fix flaky KafkaIndexTaskTest.

The testRunTransactionModeRollback case had many race conditions. Most notably,
it would commit a transaction and then immediately check to see that the results
were *not* indexed. This is racey because it relied on the indexing thread being
slower than the test thread.

Now, the case waits for the transaction to be processed by the indexing thread
before checking the results.

* Changes from review.
2022-06-24 13:53:51 -07:00
Didip Kerabat
6ddb828c7a
Able to filter Cloud objects with glob notation. (#12659)
In a heterogeneous environment, sometimes you don't have control over the input folder. Upstream can put any folder they want. In this situation the S3InputSource.java is unusable.

Most people like me solved it by using Airflow to fetch the full list of parquet files and pass it over to Druid. But doing this explodes the JSON spec. We had a situation where 1 of the JSON spec is 16MB and that's simply too much for Overlord.

This patch allows users to pass {"filter": "*.parquet"} and let Druid performs the filtering of the input files.

I am using the glob notation to be consistent with the LocalFirehose syntax.
2022-06-24 11:40:08 +05:30
Tejaswini Bandlamudi
1fc2f6e4b0
Throw BadQueryContextException if context params cannot be parsed (#12680) 2022-06-24 09:21:25 +05:30
Gian Merlino
d29343cbe3
Disable autokill of segments by default. (#12693)
Also add clarifying commentary to the documentation about how durationToRetain works.
2022-06-23 17:17:11 -07:00
Paul Rogers
ffcb996468
Cleanup changes pulled out of PR #12368 (#12672)
This commit contains the cleanup needed for the new integration test framework.

Changes:
- Fix log lines, misspellings, docs, etc.
- Allow the use of some of Druid's "JSON config" objects in tests
- Fix minor bug in `BaseNodeRoleWatcher`
2022-06-23 23:19:50 +05:30
Jihoon Son
3d9e3dbad9
Fix hadoop library location for integration tests (#12497) 2022-06-23 10:39:54 -05:00
Gian Merlino
4d892483ca
Fix thread-unsafe emitter usage in SeekableStreamSupervisorStateTest. (#12658)
The TestEmitter is used from different threads without concurrency
control. This patch makes the emitter thread-safe.
2022-06-22 22:29:16 -07:00
Kashif Faraz
b6f8d7a1b3
Add query context param forceExpressionVirtualColumns to always use "expression"-type virtual columns in query plan (#12583)
SQL expressions such as those containing `MV_FILTER_ONLY` and `MV_FILTER_NONE`
are planned as specialized virtual columns instead of the default `expression`-type virtual columns.
This commit adds a new context parameter to force the `expression`-type virtual columns.

Changes
- Add query context param `forceExpressionVirtualColumns`
- Use context param to determine if specialized virtual columns should be used or not
- Moved some tests into `CalciteExplainQueryTest`
2022-06-22 15:33:50 +05:30
AmatyaAvadhanula
6bcb778eeb
Add CVEs for Hadoop3 (#12336)
* Add CVEs

* Move CVEs under hadoop3 section
2022-06-22 14:12:17 +05:30
Tejaswini Bandlamudi
99e1b4efee
Update default value of inputSegmentSizeBytes in configuration docs (#12678) 2022-06-22 09:05:03 +05:30
Gian Merlino
0099940808
Add TIME_IN_INTERVAL SQL operator. (#12662)
* Add TIME_IN_INTERVAL SQL operator.

The operator is implemented as a convertlet rather than an
OperatorConversion, because this allows it to be equivalent to using
the >= and < operators directly.

* SqlParserPos cannot be null here.

* Remove unused import.

* Doc updates.

* Add words to dictionary.
2022-06-21 13:05:37 -07:00
AmatyaAvadhanula
eccdec9139
Reduce interval creation cost for segment cost computation (#12670)
Changes:
- Reuse created interval in `SegmentId.getInterval()`
- Intern intervals to save on memory footprint
2022-06-21 17:39:43 +05:30
Tejaswini Bandlamudi
a85b1d8985
Lazy Initialisation of Orc extensions module (#12663)
* Lazy initialization of Orc extension

* nit

* moving intialize method to OrcInputFormat
2022-06-21 11:13:10 +05:30
Gian Merlino
818974f6e4
ScanQuery: Fix JsonIgnore for isLegacy. (#12674)
True, false, and null have different meanings: true/false mean "legacy"
and "not legacy"; null means use the default set by ScanQueryConfig.
So, we need to respect this in the JsonIgnore setup.
2022-06-18 15:55:54 -07:00
Gian Merlino
e76a5077ef
Fix self-referential shape inspection in BaseExpressionColumnValueSelector. (#12669)
* Fix self-referential shape inspection in BaseExpressionColumnValueSelector.

The new test would throw StackOverflowError on the old code.

* Restore prior test.
2022-06-17 16:15:50 -07:00
Clint Wylie
18937ffee2
split out null value index (#12627)
* split out null value index

* gg spotbugs

* fix stuff
2022-06-17 15:29:23 -07:00
Paul Rogers
893759de91
Remove null and empty fields from native queries (#12634)
* Remove null and empty fields from native queries

* Test fixes

* Attempted IT fix.

* Revisions from review comments

* Build fixes resulting from changes suggested by reviews

* IT fix for changed segment size
2022-06-16 14:07:25 -07:00
Jill Osborne
f050069767
Segments doc update (#12344)
* Corrected heading levels in segments doc

* IMPLY-18394: Updated Segments doc

* Update docs/design/segments.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/design/segments.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/design/segments.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/design/segments.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/design/segments.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/design/segments.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update segments.md

* Updated links to changed headings in Segments doc

* Corrected spelling error

* Update segments.md

Incorporated suggestions from Paul Rogers.

* Update index.md

* Update segments.md

* Update segments.md

* Update segments.md

* Update compaction.md

* Update docs/design/segments.md

fix typo

* Update docs/ingestion/compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2022-06-16 13:25:17 -07:00
AmatyaAvadhanula
f970757efc
Optimize overlord GET /tasks memory usage (#12404)
The web-console (indirectly) calls the Overlord’s GET tasks API to fetch the tasks' summary which in turn queries the metadata tasks table. This query tries to fetch several columns, including payload, of all the rows at once. This introduces a significant memory overhead and can cause unresponsiveness or overlord failure when the ingestion tab is opened multiple times (due to several parallel calls to this API)

Another thing to note is that the task table (the payload column in particular) can be very large. Extracting large payloads from such tables can be very slow, leading to slow UI. While we are fixing the memory pressure in the overlord, we can also fix the slowness in UI caused by fetching large payloads from the table. Fetching large payloads also puts pressure on the metadata store as reported in the community (Metadata store query performance degrades as the tasks in druid_tasks table grows · Issue #12318 · apache/druid )

The task summaries returned as a response for the API are several times smaller and can fit comfortably in memory. So, there is an opportunity here to fix the memory usage, slow ingestion, and under-pressure metadata store by removing the need to handle large payloads in every layer we can. Of course, the solution becomes complex as we try to fix more layers. With that in mind, this page captures two approaches. They vary in complexity and also in the degree to which they fix the aforementioned problems.
2022-06-16 22:30:37 +05:30
Lucas Capistrant
602d95d865
Add a builder class for TestDruidCoordinatorConfig (#12624)
* Add a builder class for TestDruidCoordinatorConfig

* updates after review

* Fix formatting
2022-06-16 09:11:31 -05:00
Victoria Lim
94564b6ce6
Update screenshots for Druid console doc (#12593)
* druid console doc updates

* remove extra image

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* updated screenshot labels

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2022-06-15 16:42:20 -07:00
Gian Merlino
70f3b13621
ForkingTaskRunner: Set ActiveProcessorCount for tasks. (#12592)
* ForkingTaskRunner: Set ActiveProcessorCount for tasks.

This prevents various automatically-sized thread pools from being unreasonably
large (we don't want each task to size its pools as if it is the only thing on
the entire machine).

* Fix tests.

* Add missing LifecycleStart annotation.

* ForkingTaskRunner needs ManageLifecycle.
2022-06-15 15:56:32 -07:00
Paul Rogers
45e3111549
Clean up query contexts (#12633)
* Clean up query contexts

Uses constants in place of literal strings for context keys.
Moves some QueryContext methods to QueryContexts for reuse.

* Revisions from review comments
2022-06-15 11:31:22 -07:00
Rohan Garg
28f2c8e112
Support LoadScope for Peons + Access Modifier Updates (#12640)
* Support LoadScope for Peons

* Update access modifiers for GroupByEngineV2
2022-06-14 21:52:50 -07:00
Gian Merlino
283249c51b
NettyHttpClient: Fix double-return on certain exceptions. (#12626)
The "exceptionCaught" handler may get called multiple times. We should
only return the channel to the pool the first time. Returning it more
than once leads to a warning like "Resource at key[%s] was returned
multiple times?"
2022-06-14 21:40:47 -07:00
Gian Merlino
1f6e888472
Add QoSFilters first in the chain. (#12625)
* Add QoSFilters first in the chain.

When a request is suspended and later resumed due to QoS constraints,
its filter chain is restarted. Placing QoSFilters first in the chain
avoids double-execution of other filters.

Fixes an issue where requests deferred by QoS would report 403 Forbidden
due to double-execution of SecuritySanityCheckFilter.

* Smaller changes.

* Add QoS filters in BaseJettyTest.

* Remove unused parameter.
2022-06-14 13:37:00 -07:00
Gian Merlino
ceb4ace118
NettyHttpClient: Replace ReadTimeoutException with our own exception. (#12635)
* NettyHttpClient: Replace ReadTimeoutException with our own exception.

* Replace exception with same type.

* Remove unused import.
2022-06-14 13:34:46 -07:00
Vadim Ogievetsky
6f7fa334fd
Web console: totalNumMergeTasks can be set on range also (#12648)
* totalNumMergeTasks can be set on range also

* fix formatting
2022-06-14 11:18:17 -07:00
Atul Mohan
68bae6eafb
Fix version in master (#12644) 2022-06-14 11:32:46 +05:30
Rohan Garg
afaea251f2
Push join build table values as filter incase of duplicates (#12225)
* Push join build table values as filter

* Add tests for JoinableFactoryWrapper

* fixup! Push join build table values as filter

* fixup! Add tests for JoinableFactoryWrapper

* fixup! Push join build table values as filter
2022-06-13 17:18:27 -07:00
317brian
27e8b43673
fix: update footer copyright year (#12594) 2022-06-13 16:29:58 -07:00
Gian Merlino
1ace7336cd
Update node to 14.19.3. (#12632) 2022-06-10 10:18:12 -07:00
Victoria Lim
353475bd36
Docs for automatic compaction (#12569)
* docs for auto-compaction

* fix broken links

* another link

* Apply suggestions from code review

Co-authored-by: Suneet Saldanha <suneet@apache.org>

* Apply suggestions from code review

Co-authored-by: Suneet Saldanha <suneet@apache.org>

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Suneet Saldanha <suneet@apache.org>

* reorg content for skipOffset

* Update docs/ingestion/automatic-compaction.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Apply suggestions from code review

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

Co-authored-by: Suneet Saldanha <suneet@apache.org>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2022-06-09 14:55:12 -07:00
TSFenwick
a3603ad6b0
Use DefaultQueryConfig in SqlLifecycle to correctly populate request logs (#12613)
Fixes an issue where sql query request logs do not include the default query context
values set via `druid.query.default.context.xyz` runtime properties.

# Change summary
* Inject `DefaultQueryConfig` into `SqlLifecycleFactory`
* Add params from `DefaultQueryConfig` to the query context in `SqlLifecycle`

# Description
- This change does not affect query execution. This is because the
  `DefaultQueryConfig` was already being used in `QueryLifecycle`,
   which is initialized when the SQL is translated to a native query. 
- This also handles any potential use case where a context parameter should be
   handled at the SQL stage itself.
2022-06-08 12:52:50 +05:30
Gian Merlino
8fbf92e047
SqlSegmentsMetadataQuery: Fix OVERLAPS for wide target segments. (#12600)
* SqlSegmentsMetadataQuery: Fix OVERLAPS for wide target segments.

Segments with endpoints prior to year 0 or after year 9999 may overlap
the search intervals but not match the generated SQL conditions. So, we
need to add an additional OR condition to catch these.

I checked a real, live MySQL metadata store to confirm that the query
still uses metadata store indexes. It does.

* Add comments.
2022-06-07 11:33:46 -07:00
Abhishek Agarwal
59a0c10c47
Add remedial information in error message when type is unknown (#12612)
Often users are submitting queries, and ingestion specs that work only if the relevant extension is not loaded. However, the error is too technical for the users and doesn't suggest them to check for missing extensions. This PR modifies the error message so users can at least check their settings before assuming that the error is because of a bug.
2022-06-07 20:22:45 +05:30
Laksh Singla
81c37c6515
Add validation for invalid partitioned by granularities (#12589)
* Add validation for invalid partitioned by granularities

* review comments

* improve error message, change location of the method

* remove imports

* use StringUtils.lowercase

Co-authored-by: Adarsh Sanjeev <adarshsanjeev@gmail.com>
2022-06-06 22:00:29 +05:30
Adarsh Sanjeev
5a283964ca
Improve SQL validation error messages (#12611)
Update the SQL validation error message to specify whether
the ingest is INSERT or REPLACE for better user experience.
2022-06-06 16:14:28 +05:30
Gian Merlino
abf0e0a159
CompressionStrategyTest: Fix thread-unsafe Closer usage. (#12605)
Closer is not thread-safe, so we need one per thread in the
concurrency tests.
2022-06-04 10:57:13 -07:00
Gian Merlino
a503683a4a
Add caching and CSP response headers. (#12609)
* Add caching and CSP response headers.

* Fix tests.

* Fix checkstyle issues

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2022-06-04 21:46:49 +05:30
Victoria Lim
1506b26ce4
fix typo (#12607) 2022-06-04 13:14:18 +08:00
Gian Merlino
a27f4f5740
Service stdout log files, move logs to log/. (#12570)
* Service stdout log files, move logs to log/.

Two changes that make log behavior cleaner:

1) Redirect messages from the Java runtime to their own log files.
   Otherwise, they would get jumbled up in the output of the all-in-one
   start command.

2) Use log/ instead of bin/log/ for the default log directory. Makes them
   easier to find.

Additionally, add documentation about how to avoid the reflective
access warnings in Java 11.

* Spelling.

* See if code formatting affects spelling.
2022-06-03 10:44:29 +05:30
Jill Osborne
9c8e6bb000
Addition to Multitenancy considerations doc (#12567)
* Small addition to Multitenancy considerations doc

* Update docs/querying/multitenancy.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update multitenancy.md

Edit suggested by @kfaraz

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2022-06-02 10:32:14 -07:00
dependabot[bot]
4558b815e5
Bump eventsource from 1.1.0 to 1.1.1 in /web-console (#12595)
Bumps [eventsource](https://github.com/EventSource/eventsource) from 1.1.0 to 1.1.1.
- [Release notes](https://github.com/EventSource/eventsource/releases)
- [Changelog](https://github.com/EventSource/eventsource/blob/master/HISTORY.md)
- [Commits](https://github.com/EventSource/eventsource/compare/v1.1.0...v1.1.1)

---
updated-dependencies:
- dependency-name: eventsource
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-01 22:04:30 -07:00
dependabot[bot]
c49277bd2b
Bump eventsource from 1.0.7 to 1.1.1 in /website (#12596)
Bumps [eventsource](https://github.com/EventSource/eventsource) from 1.0.7 to 1.1.1.
- [Release notes](https://github.com/EventSource/eventsource/releases)
- [Changelog](https://github.com/EventSource/eventsource/blob/master/HISTORY.md)
- [Commits](https://github.com/EventSource/eventsource/compare/v1.0.7...v1.1.1)

---
updated-dependencies:
- dependency-name: eventsource
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-01 22:04:04 -07:00
Clint Wylie
98f6bca2cd
fix regression with ipv4_match and prefixes (#12542)
* fix issue with ipv4_match and prefixes
2022-06-01 14:03:08 -07:00
dependabot[bot]
23b9a6f9eb
Bump lodash from 4.17.15 to 4.17.21 in /website (#12409)
Bumps [lodash](https://github.com/lodash/lodash) from 4.17.15 to 4.17.21.
- [Release notes](https://github.com/lodash/lodash/releases)
- [Commits](https://github.com/lodash/lodash/compare/4.17.15...4.17.21)

---
updated-dependencies:
- dependency-name: lodash
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-01 13:56:22 -07:00
dependabot[bot]
86d01b3681
Bump opentelemetry-instrumentation-bom-alpha (#12531)
Bumps [opentelemetry-instrumentation-bom-alpha](https://github.com/open-telemetry/opentelemetry-java-instrumentation) from 1.7.0-alpha to 1.14.0-alpha.
- [Release notes](https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-java-instrumentation/commits)

---
updated-dependencies:
- dependency-name: io.opentelemetry.instrumentation:opentelemetry-instrumentation-bom-alpha
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-01 13:51:39 -07:00
Clint Wylie
31f988ec76
fix backwards compatibility for explicit null columns (#12585) 2022-06-01 12:39:48 -07:00