Commit Graph

12436 Commits

Author SHA1 Message Date
Victoria Lim 33efd5ab1d
docs: Refresh the update data tutorial (#13641)
Merging regardless of nit since topic is in better shape.

* refresh the update data tutorial

* Apply suggestions from code review

Co-authored-by: Jill Osborne <jill.osborne@imply.io>

---------

Co-authored-by: Jill Osborne <jill.osborne@imply.io>
2023-02-01 18:18:16 -08:00
Kashif Faraz f629643c50
Fix value of lookup sync period in docs (#13695)
* Fix lookup docs

* Fix spelling

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

---------

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-02-01 18:12:00 -08:00
Sergio Ferragut 7f830b20d7
fixed init commands for both mysql and postgresql (#13713) 2023-02-01 18:07:31 -08:00
Suneet Saldanha cfc3115a59
Compaction history returns empty list instead of 404 when not found (#13730)
* Compaction history returns empty list instead of 404 when not found

* checkstyle
2023-02-01 17:44:07 -08:00
AmatyaAvadhanula 76e79c7db7
Suppress CVEs (#13733) 2023-02-01 04:18:41 -08:00
somu-imply 74ff848ce5
Fixing incorrect filtering of nulls in an array when ingesting for JSON and Avro (#13712) 2023-02-01 04:15:08 -08:00
Tejaswini Bandlamudi c95a26cae3
Migrate ITs from Travis to GHA (#13681) 2023-02-01 03:31:29 -08:00
Jason Koch 7a3bd89a85
Dimension dictionary reduce locking (#13710)
* perf: introduce benchmark for StringDimensionIndexer

jdk11 -- Benchmark                                                       Mode  Cnt      Score     Error  Units
StringDimensionIndexerProcessBenchmark.parallelReadWrite                 avgt   10  30471.552 ±  456.716  us/op
StringDimensionIndexerProcessBenchmark.parallelReadWrite:parallelReader  avgt   10  18069.863 ±  327.923  us/op
StringDimensionIndexerProcessBenchmark.parallelReadWrite:parallelWriter  avgt   10  67676.617 ± 2351.311  us/op
StringDimensionIndexerProcessBenchmark.soloReader                        avgt   10   1048.079 ±    1.120  us/op
StringDimensionIndexerProcessBenchmark.soloWriter                        avgt   10   4629.769 ±   29.353  us/op

* perf: switch DimensionDictionary to StampedLock

jdk11 - Benchmark                                                        Mode  Cnt      Score      Error  Units
StringDimensionIndexerProcessBenchmark.parallelReadWrite                 avgt   10  37958.372 ± 1685.206  us/op
StringDimensionIndexerProcessBenchmark.parallelReadWrite:parallelReader  avgt   10  31192.232 ± 2755.365  us/op
StringDimensionIndexerProcessBenchmark.parallelReadWrite:parallelWriter  avgt   10  58256.791 ± 1998.220  us/op
StringDimensionIndexerProcessBenchmark.soloReader                        avgt   10   1079.440 ±    1.753  us/op
StringDimensionIndexerProcessBenchmark.soloWriter                        avgt   10   4585.690 ±   13.225  us/op

* perf: use optimistic locking in DimensionDictionary

jdk11 - Benchmark                                                        Mode  Cnt      Score     Error  Units
StringDimensionIndexerProcessBenchmark.parallelReadWrite                 avgt   10   6212.366 ± 162.684  us/op
StringDimensionIndexerProcessBenchmark.parallelReadWrite:parallelReader  avgt   10   1807.235 ± 109.339  us/op
StringDimensionIndexerProcessBenchmark.parallelReadWrite:parallelWriter  avgt   10  19427.759 ± 611.692  us/op
StringDimensionIndexerProcessBenchmark.soloReader                        avgt   10    194.370 ±   1.050  us/op
StringDimensionIndexerProcessBenchmark.soloWriter                        avgt   10   2871.423 ±  14.426  us/op

* perf: refactor DimensionDictionary null handling to need less locks

jdk11 - Benchmark                                                        Mode  Cnt      Score      Error  Units
StringDimensionIndexerProcessBenchmark.parallelReadWrite                 avgt   10   6591.619 ±  470.497  us/op
StringDimensionIndexerProcessBenchmark.parallelReadWrite:parallelReader  avgt   10   1387.338 ±  144.587  us/op
StringDimensionIndexerProcessBenchmark.parallelReadWrite:parallelWriter  avgt   10  22204.462 ± 1620.806  us/op
StringDimensionIndexerProcessBenchmark.soloReader                        avgt   10    204.911 ±    0.459  us/op
StringDimensionIndexerProcessBenchmark.soloWriter                        avgt   10   2935.376 ±   12.639  us/op

* perf: refactor DimensionDictionary add handling to do a little less work

jdk11 - Benchmark                                                        Mode  Cnt      Score    Error  Units
StringDimensionIndexerProcessBenchmark.parallelReadWrite                 avgt   10   2914.859 ± 22.519  us/op
StringDimensionIndexerProcessBenchmark.parallelReadWrite:parallelReader  avgt   10    508.010 ± 14.675  us/op
StringDimensionIndexerProcessBenchmark.parallelReadWrite:parallelWriter  avgt   10  10135.408 ± 82.745  us/op
StringDimensionIndexerProcessBenchmark.soloReader                        avgt   10    205.415 ±  0.158  us/op
StringDimensionIndexerProcessBenchmark.soloWriter                        avgt   10   3098.743 ± 23.603  us/op
2023-02-01 02:59:12 -08:00
Clint Wylie ec1e6ac840
fix nested column handling of null and "null" (#13714)
* fix nested column handling of null and "null"
* fix issue merging nested column value dictionaries that could incorrect lose dictionary values
2023-01-31 20:59:19 -08:00
Tijo Thomas 1beef30bb2
Support postaggregation function as in Math.pow() (#13703) (#13704)
Support postaggregation function as in Math.pow()
2023-01-31 22:55:04 +05:30
Adarsh Sanjeev 51dfde0284
Add maxInputBytesPerWorker as query context parameter (#13707)
* Add maxInputBytesPerWorker as query context parameter

* Move documenation to msq specific docs

* Update tests

* Spacing

* Address review comments

* Fix test

* Update docs/multi-stage-query/reference.md

* Correct spelling mistake

---------

Co-authored-by: Karan Kumar <karankumar1100@gmail.com>
2023-01-31 20:55:28 +05:30
Xavier Léauté 698670c88e
update core Apache Kafka dependencies to 3.3.2 (#13717)
Release notes:
- https://downloads.apache.org/kafka/3.3.2/RELEASE_NOTES.html
2023-01-27 21:00:01 -08:00
Jill Osborne 356b0e37cf
Tutorial: Query view (#13565)
* Tutorial: Query view

* Removed duplicate file

* Update tutorial-sql-query-view.md

* Update tutorial-sql-query-view.md

* Update tutorial-sql-query-view.md

* Updated after review

* Update docs/tutorials/tutorial-sql-query-view.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update tutorial-sql-query-view.md

Update title

* Update sidebars.json

fix merge conflict w/ sidebar

* address spelling ci

---------

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-01-27 14:29:43 -08:00
Vadim Ogievetsky 3b62d7929c
Web console: Data loader should allow for multiline JSON messages in kafka (#13709)
* stricter

* data loader should allow for mulit-line json

* add await

* kinesis also
2023-01-25 21:23:18 -08:00
sairam devarashetty 6164c420a1
Create update.md (#13451)
* Create update.md

Important Line highlighted

* Update docs/data-management/update.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-01-25 16:23:40 -08:00
317brian 9021161c8c
doc: fix markdown spacing (#13683)
* doc: fix markdown spacing

* fix spacing
2023-01-25 16:22:49 -08:00
somu-imply 17c0167248
Additional native query tests for unnest datasource (#13554)
Native tests for the unnest datasource.
2023-01-25 15:57:52 -08:00
Victoria Lim 00cee329bd
pitfall when using combining input source (#13639) 2023-01-25 12:50:19 -08:00
imply-cheddar 706b8a0227
Adjust Operators to be Pausable (#13694)
* Adjust Operators to be Pausable

This enables "merge" style operations that
combine multiple streams.

This change includes a naive implementation
of one such merge operator just to provide
concrete evidence that the refactoring is
effective.
2023-01-23 20:52:06 -08:00
Suneet Saldanha 016c881795
Add API to return automatic compaction config history (#13699)
Add a new API to return the history of changes to automatic compaction config history to make it easy for users to see what changes have been made to their auto-compaction config.

The API is scoped per dataSource to allow users to triage issues with an individual dataSource. The API responds with a list of configs when there is a change to either the settings that impact all auto-compaction configs on a cluster or the dataSource in question.
2023-01-23 13:23:45 -08:00
somu-imply 90d445536d
SQL version of unnest native druid function (#13576)
* adds the SQL component of the native unnest functionality in Druid to unnest SQL queries on a table dimension, virtual column or a constant array and convert them into native Druid queries
* unnest in SQL is implemented as a combination of Correlate (the comma join part) and Uncollect (the unnest part)
2023-01-23 12:53:31 -08:00
Rohan Garg f76acccff2
Allow using composed storage for SuperSorter intermediate data (#13368) 2023-01-24 01:02:03 +05:30
Laksh Singla a516eb1a41
Port Calcite's tests to run with MSQ (#13625)
* SQL test framework extensions

* Capture planner artifacts: logical plan, etc.
* Planner test builder validates the logical plan
* Validation for the SQL resut schema (we already have
  validation for the Druid row signature)
* Better Guice integration: properties, reuse Guice modules
* Avoid need for hand-coded expr, macro tables
* Retire some of the test-specific query component creation
* Fix query log hook race condition

Co-authored-by: Paul Rogers <progers@apache.org>
2023-01-19 08:51:11 -08:00
Clint Wylie fb26a1093d
discover nested columns when using nested column indexer for schemaless ingestion (#13672)
* discover nested columns when using nested column indexer for schemaless
* move useNestedColumnIndexerForSchemaDiscovery from AppendableIndexSpec to DimensionsSpec
2023-01-18 12:57:28 -08:00
Maytas Monsereenusorn 1582d74f37
Fix Parquet Reader for schema-less ingestion need to read all columns (#13689)
* fix stuff

* address comments
2023-01-18 12:52:12 -08:00
Paul Rogers fa493f1ebc
Convert from DRUID_INTEGRATION_TEST_INDEXER to USE_INDEXER (#13684)
The old ITs use DRUID_INTEGRATION_TEST_INDEXER. The new ones use the
USE_INDEXER env var passed in from the build environment.
2023-01-18 08:51:42 -08:00
Tejaswini Bandlamudi 7a54524076
install node on runners (#13690) 2023-01-18 16:19:57 +05:30
Eyal Yurman 44374f91bc
Fix broken links to Oracle JDK docs (#13687)
* Fix broken link for SSLContext java doc

* Update tls-support.md

* Update tls-support.md

* Update tls-support.md

* Update simple-client-sslcontext.md
2023-01-18 14:46:08 +05:30
Paul Rogers 22630b0aab
Much improved table functions (#13627)
Much improved table functions

* Revises properties, definitions in the catalog
* Adds a "table function" abstraction to model such functions
* Specific functions for HTTP, inline, local and S3.
* Extended SQL types in the catalog
* Restructure external table definitions to use table functions
* EXTEND syntax for Druid's extern table function
* Support for array-valued table function parameters
* Support for array-valued SQL query parameters
* Much new documentation
2023-01-17 08:41:57 -08:00
Benedict Jin 59dfe7bed3
Add new probe delay configurations into Helm Chart doc (#12997) 2023-01-17 22:06:24 +05:30
Abhishek Agarwal cc89c661d0
Move the tips section in PR template into comments block (#13676) 2023-01-16 17:01:20 +05:30
imply-cheddar 7ff3722cb9
Swap LazySingleton for Singleton (#13673)
* Swap LazySingleton for Singleton
* Initialize WebserverTestUtils properly
2023-01-15 21:38:37 -08:00
Paul Rogers ed623d626f
Support both Indexer and MiddleManager in ITs (#13660)
Support both indexer and MM in ITs

Support for the DRUID_INTEGRATION_TEST_INDEXER variable
Conditional client cluster configuration
Cleanup of OVERRIDE_ENV file handling
Enforce setting of test-specific env vars
Cleanup of unused bits
2023-01-14 14:34:06 -08:00
imply-cheddar 566fc990e4
Semantic Implementations for ArrayListRAC (#13652)
* Semantic Implementations for ArrayListRAC

This adds implementations of semantic interfaces
to optimize (eliminate object creation) the
window processing on top of an ArrayListSegment.

Tests are also added to cover the interplay
between the semantic interfaces that are expected
for this use case
2023-01-13 19:42:34 -08:00
Tejaswini Bandlamudi 4368b3a071
Migrate jdk8 unit tests from Travis to GHA (#13518)
* migrate UTs form Travis to GHA

* update permissions

* rename file

* set fetch depth to 1

* debugs remote branches

* test with github.ref variable

* fetch github.base_ref for diff

* nit

* test git diff

* run tests

* test code coverage failure scenario

* nit

* nit

* revert code changes

* revert code changes

* Setup diff-test-coverage before tests

* build distribution module at end in packaging check

* nit

* remove redundant steps in static-checks workflow

* drop jdk8 unit tests from Travis
2023-01-13 14:46:58 +05:30
Gian Merlino 182c4fad29
Kinesis: More robust default fetch settings. (#13539)
* Kinesis: More robust default fetch settings.

1) Default recordsPerFetch and recordBufferSize based on available memory
   rather than using hardcoded numbers. For this, we need an estimate
   of record size. Use 10 KB for regular records and 1 MB for aggregated
   records. With 1 GB heaps, 2 processors per task, and nonaggregated
   records, recordBufferSize comes out to the same as the old
   default (10000), and recordsPerFetch comes out slightly lower (1250
   instead of 4000).

2) Default maxRecordsPerPoll based on whether records are aggregated
   or not (100 if not aggregated, 1 if aggregated). Prior default was 100.

3) Default fetchThreads based on processors divided by task count on
   Indexers, rather than overall processor count.

4) Additionally clean up the serialized JSON a bit by adding various
   JsonInclude annotations.

* Updates for tests.

* Additional important verify.
2023-01-13 11:03:54 +05:30
Clint Wylie b5b740bbbb
allow using nested column indexer for schema discovery (#13653)
* single typed "root" only nested columns now mimic "regular" columns of those types
* incremental index can now use nested column indexer instead of string indexer for discovered columns
2023-01-12 18:31:12 -08:00
Vadim Ogievetsky 93dc01b6c5
fix broken table missing new line (#13666) 2023-01-12 15:29:51 -08:00
Adarsh Sanjeev cb16a7f6a9
Fix behaviour of downsampling buckets to a single key (#13663) 2023-01-12 21:24:24 +05:30
Adarsh Sanjeev 0a486c3bcf
Update forbidden apis with fixed executor (#13633)
* Update forbidden apis with fixed executor
2023-01-12 15:34:36 +05:30
Adarsh Sanjeev afb3d91777
Add unit test for complex column grouping (#13650)
* Add unit test for complex column grouping

Co-authored-by: Karan Kumar <karankumar1100@gmail.com>
2023-01-12 15:25:01 +05:30
Maytas Monsereenusorn 7f54ebbf47
Fix Parquet Parser missing column when reading parquet file (#13612)
* fix parquet reader

* fix checkstyle

* fix bug

* fix inspection

* refactor

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix checkstyle

* add test

* fix checkstyle

* fix tests

* add IT

* add IT

* add more tests

* fix checkstyle

* fix stuff

* fix stuff

* add more tests

* add more tests
2023-01-11 20:08:48 -10:00
Vadim Ogievetsky f97bcc69d3
Docs: reword single server page (#13659)
* reword single server page

* fix typo

* Update docs/operations/single-server.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* spelling

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-01-11 21:12:52 -08:00
Rishabh Singh a83d1cdf26
fix var name (#13657) 2023-01-11 21:15:30 +05:30
abhagraw 5ef689fc3f
Cloud deep storage tests in new IT framework (S3, GCS, Azure) (#13535)
* MSQ s3 deep storage tests

* Fix license check

* Getting config values from env variables

* Added  s3TestUtils

* Merged AbstractITSQLBasedIngestionTest with AbstractITBatchIndexTest

* Fixing license issues

* Fixing checkstyle errors

* Fix spotbug errors

* Update s3util name in other files

* GCS and Azure deep storage tests

* Fix license and checkstyle errors

* Fix dependency error

* fix intellij check errors

* Copy credentials file in all containers

* Refactor and gcs file upload fix

* Fixing dependency check errors and codeQL warnings

* Fixing checkstyle errors

* Fixing intellij inspection errors

* Removing unrequired exceptions

* Addressing comments
2023-01-11 09:43:44 +05:30
Karan Kumar 56076d33fb
Worker retry for MSQ task (#13353)
* Initial commit.

* Fixing error message in retry exceeded exception

* Cleaning up some code

* Adding some test cases.

* Adding java docs.

* Finishing up state test cases.

* Adding some more java docs and fixing spot bugs, intellij inspections

* Fixing intellij inspections and added tests

* Documenting error codes

* Migrate current integration batch tests to equivalent MSQ tests (#13374)

* Migrate current integration batch tests to equivalent MSQ tests using new IT framework

* Fix build issues

* Trigger Build

* Adding more tests and addressing comments

* fixBuildIssues

* fix dependency issues

* Parameterized the test and addressed comments

* Addressing comments

* fixing checkstyle errors

* Adressing comments

* Adding ITTest which kills the worker abruptly

* Review comments phase one

* Adding doc changes

* Adjusting for single threaded execution.

* Adding Sequential Merge PR state handling

* Merge things

* Fixing checkstyle.

* Adding new context param for fault tolerance.
Adding stale task handling in sketchFetcher.
Adding UT's.

* Merge things

* Merge things

* Adding parameterized tests
Created separate module for faultToleranceTests

* Adding missed files

* Review comments and fixing tests.

* Documentation things.

* Fixing IT

* Controller impl fix.

* Fixing racy WorkerSketchFetcherTest.java exception handling.

Co-authored-by: abhagraw <99210446+abhagraw@users.noreply.github.com>
Co-authored-by: Karan Kumar <cryptoe@karans-mbp.lan>
2023-01-11 07:38:29 +05:30
Abhishek Agarwal 17936e2920
Add an option to enable HSTS in druid services (#13489)
* Add an option to enable HSTS

* Fix code and add docs

* Deduplicate headers

* unused import

* Fix spelling
2023-01-10 22:31:51 +05:30
Dongjoon Hyun 2503095296
Publish SBOM artifacts (#13648) 2023-01-10 16:08:10 +05:30
abhagraw 74a76c74b1
Updating dependency check version (#13649) 2023-01-10 14:43:19 +05:30
Abhishek Radhakrishnan 41fdf6eafb
Quote and escape literals in JDBC lookup to allow reserved identifiers. (#13632)
* Quote and escape table, key and column names.

* fix typo.

* More select statements.

* Derby lookup tests create quoted identifiers so it's compatible.

* Use Stringutils.replace() utility.

* quote the filter string.

* Squish doubly quote usage into a single function.

* Add parameterized test with reserved identifiers.

* few changes.
2023-01-10 12:11:54 +05:30