Commit Graph

12641 Commits

Author SHA1 Message Date
Paul Rogers 5dadbdf4d0
Generate the IT docker-compose.yaml files (#13669)
Generate IT docker-compose.sh files

Generates test-specific docker-compose.sh files using a simple
Python template script.
2023-02-21 15:03:02 -08:00
benkrug c6b1576fc1
Update clean-metadata-store.md (#13131)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2023-02-21 12:53:54 -08:00
Clint Wylie 614205f3bc
fix some intellij inspections in druid-processing (#13823)
fix some intellij inspections in druid-processing
2023-02-21 09:02:02 +05:30
Lucas Capistrant 46eafa57e1
Improve client change counter management in HTTP Server View (#13010)
* Avoid calling resolveWaitingFutures if there are no changes made

* Avoid telling HTTP serveview client to reset counter when their counter is valid
2023-02-20 17:32:27 +05:30
Tejaswini Bandlamudi e788f1ae6b
Add option to run standard & revised ITs manually on PRs (#13814)
Create the docker image in case of maven dependencies cache restore failure too as env.sh file is removed on maven rebuild.
Increase java heap size for security IT failing with error
2023-02-20 16:15:15 +05:30
Gian Merlino 882ae9f002
Speed up composite key joins on IndexedTable. (#13516)
* Speed up composite key joins on IndexedTable.

Prior to this patch, IndexedTable indexes are sorted IntList. This works
great when we have a single-column join key: we simply retrieve the list
and we know what rows match. However, when we have a composite key, we
need to merge the sorted lists. This is inefficient when one is very dense
and others are very sparse.

This patch switches from sorted IntList to IntSortedSet, and changes
to the following intersection algorithm:

1) Initialize the intersection set to the smallest matching set from the
   various parts of the composite key.

2) For each element in that smallest set, check other sets for that element.
   If any do *not* include it, then remove the element from the intersection
   set.

This way, complexity scales with the size of the smallest set, not the
largest one.

* RangeIntSet stuff.
2023-02-17 22:01:01 -08:00
Paul Rogers 85d36be085
Information schema now uses numeric column types (#13777)
Change to use SQL schemas to allow null numeric columns

* Updated docs
2023-02-17 14:39:31 -08:00
Clint Wylie 08b5951cc5
merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698)
* merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything
* fix poms and license stuff
* mockito is evil
* allow reset of JvmUtils RuntimeInfo if tests used static injection to override
2023-02-17 14:27:41 -08:00
Abhishek Agarwal 8d03ace1b4
Use K3S instead of minikube for integration tests (#13782)
We are seeing failures on GHA while using minikube so switching to K3S instead.
2023-02-17 23:06:30 +05:30
Katya Macedo bc8b710b7e
Fix broken link (#13767) 2023-02-17 09:02:12 -08:00
Abhishek Agarwal ddae7974c2
Don't run UTs, ITs when there are only docs, helm and console changes. (#13812)
* Don't run UTs, ITs for docs, helm and console changes

* Test

* right place

* Revert test

* Remove ITs from ignore list

* Update unit-and-integration-tests-unified.yml

* Update unit-and-integration-tests-unified.yml
2023-02-17 09:56:17 +05:30
317brian add2081f7c
gha: add auto labeler for doc prs (#13791)
This adds an autolabeler GitHub action that will label any changes to the docs directories (docs and the Jupyter nb directory) as Area - Documentation.

The GHA will also remove the label if the PR changes and no longer touches files in those directories.
2023-02-16 12:05:34 +05:30
Vadim Ogievetsky 1ca0edb8c9
Web console: Fixes query cancel NPE and more (#13786)
* add null icon

* empty string table cell

* enable views only if they will work

* make sure method exists

* use SQL compatible nulls for e2e tests
2023-02-15 15:02:50 -08:00
Abhishek Agarwal 460d8b8a2a
Add license header to template file (#13811) 2023-02-15 04:12:15 -08:00
Elliott Freis e5ecb4d6aa
Flip boolean logic so the test description makes sense (#13805)
Co-authored-by: Elliott Freis <elliottfreis@Elliott-Freis.local>
2023-02-15 16:07:49 +05:30
Paul Rogers 333196d207
Code cleanup & message improvements (#13778)
* Misc cleanup edits

Correct spacing
Add type parameters
Add toString() methods to formats so tests compare correctly
IT doc revisions
Error message edits
Display UT query results when tests fail

* Edit

* Build fix

* Build fixes
2023-02-15 15:22:54 +05:30
Adarsh Sanjeev e8330e95f5
Update Apache Kafka dependencies to 3.4.0 (#13802)
Release notes:
- https://downloads.apache.org/kafka/3.4.0/RELEASE_NOTES.html
2023-02-15 15:15:13 +05:30
Jason Witkowski 355cdbeb86
helm: Fix PDB apiVersion to allow K8s 1.25+ deployment (#13783) 2023-02-15 11:24:01 +05:30
Tejaswini Bandlamudi 9ffaba9c7f
Fix MySQL drivers setup for Revised ITs (#13800)
* download both mysql drivers and use org.mariadb.jdbc.Driver for now

* use com.mysql.jdbc.Driver
2023-02-15 11:03:25 +05:30
Suneet Saldanha f67abf2e99
Better logs for query errors (#13776)
* Better logs for query errors

* checkstyle
2023-02-14 15:55:58 -08:00
서재권(Data Platform) f3e19f69bb
Support prometheus emitter (#13531)
modify helm chart to support scraping from prometheus automatically
2023-02-14 15:40:07 +05:30
Churro c1f283fd31
Better sidecar support (#13655)
* Better sidecar support

* remove un-thrown exception from test

* Druid you are such a stickler about spelling :)

* Only require the primaryContainerName, no need to exclude containers
2023-02-14 10:56:15 +05:30
Clint Wylie fa4cab405f
fix bug with sql planner when virtual column capabilities are null (#13797) 2023-02-13 18:27:23 -08:00
Paul Rogers 842ee554de
Refinements to input-source specific table functions (#13780)
Refinements to table functions

Fixes various bugs
Improves the structure of the table function classes
Adds unit and integration tests
2023-02-13 16:21:27 -08:00
Guy ☀️ Moore 306997be87
Add Perl 5 to druid requirements (#13708)
Without perl 5 I was unable to start druid using the instructions in the quickstart guide. I'm not certain what versions it might require, but the one that I got working was perl 5

> This is perl 5, version 36, subversion 0 (v5.36.0) built for x86_64-linux-thread-multi
2023-02-13 13:34:49 -08:00
Clint Wylie f09f83697d
fix array_agg to work with complex types and bugs with expression aggregator complex array handling (#13781)
* fix array_agg to work with complex types and bugs with expression aggregator complex array handling
* more consistent handling of array expressions, numeric arrays more consistently honor druid.generic.useDefaultValueForNull, fix array_ordinal sql output type
2023-02-12 22:01:39 -08:00
zachjsh 38e620aa4c
Operator conversion deny list (#13766)
### Description

This change adds a new config property `druid.sql.planner.operatorConversion.denyList`, which allows a user to specify
any operator conversions that they wish to disallow. A user may want to do this for a number of reasons, including security concerns. The default value of this property is the empty list `[]`, which does not disallow any operator conversions.

An example usage of this property is `druid.sql.planner.operatorConversion.denyList=["extern"]`, which disallows the usage of the `extern` operator conversion. If the property is configured this way, and a user of the Druid cluster tries to submit a query that uses the `extern` function, such as the example given [here](https://druid.apache.org/docs/latest/multi-stage-query/examples.html#insert-with-no-rollup), a response with http response code `400` is returned with en error body similar to the following:

```
{
  "taskId": "4ec5b0b6-fa9b-4c3a-827d-2308294e9985",
  "state": "FAILED",
  "error": {
    "error": "Plan validation failed",
    "errorMessage": "org.apache.calcite.runtime.CalciteContextException: From line 28, column 5 to line 32, column 5: No match found for function signature EXTERN(<CHARACTER>, <CHARACTER>, <CHARACTER>)",
    "errorClass": "org.apache.calcite.tools.ValidationException",
    "host": null
  }
}
```
2023-02-10 09:59:26 -08:00
Tejaswini Bandlamudi 477bc424d9
Fix GHA tests cache hit miss scenario (#13772)
* rebuild maven project or docker image in case of cache hit miss
* rebuild maven project in case of docker cache hit miss too
* fix docker-restore cache hit fail issue
2023-02-10 08:57:45 -08:00
Tejaswini Bandlamudi 752964390e
remove Travis CI (#13789) 2023-02-10 01:46:56 -08:00
Anshu Makkar d7b95988d7
Add missing documentation for constant post-aggregator (#13664)
Thanks @anshu-makkar , I was waiting for CI to complete yesterday. Failures seem unrelated, so merging.
2023-02-09 08:53:45 -08:00
Clint Wylie ffeda72abb
fix filtering nested field virtual column when used with non nested column input (#13779)
* fix filtering nested field virtual column when used with non nested column input
2023-02-09 03:16:38 -08:00
Katya Macedo 58d9720b00
docs: notebook only for SQL tutorial (#13465)
CI Failures seem unrelated to docs

* docs: notebook only for SQL tutorial

* Update logical operators section

* Fix typo

* Adopt review suggestions

* Update examples/quickstart/jupyter-notebooks/sql-tutorial.ipynb

* Update examples, add link to keywords

* Update after review

* Update per review comments

* Add links
2023-02-08 20:04:53 -08:00
Suneet Saldanha 714ac07b52
Allow users to add additional metadata to ingestion metrics (#13760)
* Allow users to add additional metadata to ingestion metrics

When submitting an ingestion spec, users may pass a map of metadata
in the ingestion spec config that will be added to ingestion metrics.

This will make it possible for operators to tag metrics with other
metadata that doesn't necessarily line up with the existing tags
like taskId.

Druid clusters that ingest these metrics can take advantage of the
nested data columns feature to process this additional metadata.

* rename to tags

* docs

* tests

* fix test

* make code cov happy

* checkstyle
2023-02-08 18:07:23 -08:00
Adarsh Sanjeev d7a15be9bc
Add assertions for counters from reports (#13726)
Adds assertions for counters to MSQ unit tests
2023-02-08 16:33:37 +05:30
AmatyaAvadhanula 34c04daa9f
Fix infinite iteration in http sync monitoring (#13731)
* Fix infinite iteration in http task runner

* Fix infinite iteration in http server view

* Add tests
2023-02-08 15:14:11 +05:30
jamon d925ebdc9e
Bump app version in Helm Chart from 0.23.0 to 24.0.0 (#13341)
Co-authored-by: zemin <zemin.piao@adyen.com>
2023-02-08 11:48:47 +05:30
AmatyaAvadhanula 0cf1fc3d55
Indexing on multiple disks (#13476)
* Initial commit

* Simple UTs

* Parameterize tests

* Parameterized tests for k8s task runner

* Fix restore bug

* Refactor TaskStorageDirTracker

* Change CliPeon args
2023-02-08 11:31:34 +05:30
Jason Witkowski 5934d5fffe
helm: Stop helm chart from failing if zkHosts is not set (#13746) 2023-02-08 10:43:35 +05:30
imply-cheddar f684df4c22
Use an HllSketchHolder object to enable optimized merge (#13737)
* Use an HllSketchHolder object to enable optimized merge

HllSketchAggregatorFactory.combine had been implemented using a
pure pair-wise, "make a union -> add 2 things to union -> get sketch"
algorithm.  This algorithm does 2 things that was CPU

1) The Union object always builds an HLL_8 sketch regardless of the
  target type.  This means that when the target type is not HLL_8, we
  spent CPU cycles converting to HLL_8 and back over and over again
2) By throwing away the Union object and converting back to the
  HllSketch only to build another Union object, we do lots and lots
  of copy+conversions of the HllSketch

This change introduces an HllSketchHolder object which can hold onto
a Union object and delay conversion back into an HllSketch until
it is actually needed.  This follows the same pattern as the
SketchHolder object for theta sketches.
2023-02-07 13:57:48 -08:00
AmatyaAvadhanula dcdae84888
Add server view initialization metrics (#13716)
* Add server view init metrics

* Test coverage

* Rename metrics
2023-02-07 20:02:00 +05:30
Rohan Garg a0f8889f23
Robust handling and management of S3 streams for MSQ shuffle storage (#13741) 2023-02-07 14:17:37 +05:30
John Gozde b33962cab7
Upgrade typescript and other dependencies (#13762)
* Bump zustand, licenses

* Bump TypeScript, Eslint, use type imports

* Switch to react-shallow-renderer from enzyme

* Update ts-loader
2023-02-06 23:12:54 -08:00
Clint Wylie 2d3bee8545
various nested column (and other) fixes (#13732)
changes:
* modified druid schema column type compution to special case COMPLEX<json> handling to choose COMPLEX<json> if any column in any segment is COMPLEX<json>
* NestedFieldVirtualColumn can now work correctly on any type of column, returning either a column selector if a root path, or nil selector if not
* fixed a random bug with NilVectorSelector when using a vector size larger than the default and druid.generic.useDefaultValueForNull=false would have the nulls vector set to all false instead of true
* fixed an overly aggressive check in ExprEval.ofType when handling complex types which would try to treat any string as base64 without gracefully falling back if it was not in fact base64 encoded, along with special handling for complex<json>
* added ExpressionVectorSelectors.castValueSelectorToObject and ExpressionVectorSelectors.castObjectSelectorToNumeric as convience methods to cast vector selectors using cast expressions without the trouble of constructing an expression. the polymorphic nature of the non-vectorized engine (and significantly larger overhead of non-vectorized expression processing) made adding similar methods for non-vectorized selectors less attractive and so have not been added at this time
* fix inconsistency between nested column indexer and serializer in handling values (coerce non primitive and non arrays of primitives using asString)
* ExprEval best effort mode now handles byte[] as string
* added test for ExprEval.bestEffortOf, and add missing conversion cases that tests uncovered
* more tests more better
2023-02-06 19:48:02 -08:00
imply-cheddar 9c5b61e114
Fallback virtual column (#13739)
* Fallback virtual column

This virtual columns enables falling back to another column if
the original column doesn't exist.  This is useful when doing
column migrations and you have some old data with column X,
new data with column Y and you want to use Y if it exists, X
otherwise so that you can run a consistent query against all of
the data.
2023-02-06 19:36:50 -08:00
Paul Rogers f28c06515b
Auto-detect docker-compose (#13754) 2023-02-06 21:29:45 +05:30
Laksh Singla 9100a61bf6
Fix NPE in postCleanupStage if stage doesn't exist (#13742)
With fault tolerance enabled in MSQ, not all the work orders might be populated if the worker is restarted. In case it gets the request for cleaning up the stage which is not present in the worker's map, it can throw an NPE. Added a check to ensure that the stage is present in the map before cleaning it up, or else logging it as a warning.
2023-02-06 19:13:39 +05:30
Rohan Garg c5835c29a1
Use durable super sorter intermediate storage only with composable storage (#13748)
* This enables usage of durable storage connector only in case the composable storage feature is enabled.
2023-02-06 18:59:18 +05:30
Elliott Freis e16639121f
Local pathing for tests (#13753)
Co-authored-by: Elliott Freis <elliottfreis@Elliott-Freis.earth.dynamic.blacklight.net>
2023-02-03 20:11:17 -08:00
Elliott Freis c06631037d
Moving to SHA based cache key (#13751)
Co-authored-by: Elliott Freis <elliottfreis@Elliott-Freis.earth.dynamic.blacklight.net>
2023-02-03 15:49:17 -08:00
Suneet Saldanha bea18dc9e4
Update basic auth examples (#13750) 2023-02-03 14:45:48 -08:00