druid

Commit Graph

Author	SHA1	Message	Date
Gian Merlino	3ff51487b7	Add ZooKeeper connection state alerts and metrics. (#14333 ) * Add ZooKeeper connection state alerts and metrics. - New metric "zk/connected" is an indicator showing 1 when connected, 0 when disconnected. - New metric "zk/disconnected/time" measures time spent disconnected. - New alert when Curator connection state enters LOST or SUSPENDED. * Use right GuardedBy. * Test fixes, coverage. * Adjustment. * Fix tests. * Fix ITs. * Improved injection. * Adjust metric name, add tests.	2023-07-12 09:34:28 -07:00
Gian Merlino	3711c0d987	Reduce heap footprint of GenericIndexed. (#14563 ) Two changes: 1) Intern DecompressingByteBufferObjectStrategy. Saves ~32 bytes per column. 2) Split GenericIndexed into GenericIndexed.V1 and GenericIndexed.V2. The major benefit here is isolating out the ByteBuffers that are only needed for V2. This saves ~80 bytes for V1 (one buffer instead of two).	2023-07-12 08:11:41 -07:00
Gian Merlino	cc8b210e4c	AggregatorFactory: Use guessAggregatorHeapFootprint when factorizeWithSize is not implemented. (#14567 ) There are two ways of estimating heap footprint of an Aggregator: 1) AggregatorFactory#guessAggregatorHeapFootprint 2) AggregatorFactory#factorizeWithSize + Aggregator#aggregateWithSize When the second path is used, the default implementation of factorizeWithSize is now updated to delegate to guessAggregatorHeapFootprint, making these equivalent. The old logic used getMaxIntermediateSize, which is less accurate. Also fixes a bug where, when using the second path, calling factorizeWithSize on PassthroughAggregatorFactory would fail because getMaxIntermediateSize was not implemented. (There is no buffer aggregator, so there would be no need.)	2023-07-12 07:33:27 -07:00
hqx871	7142b0c39e	Enable result level cache for GroupByStrategyV2 on broker (#11595 ) Cache is disabled for GroupByStrategyV2 on broker since the pr #3820 [groupBy v2: Results not fully merged when caching is enabled on the broker]. But we can enable the result-level cache on broker for GroupByStrategyV2 and keep the segment-level cache disabled.	2023-07-12 15:00:01 +05:30
Nhi Pham	d76903f10b	Tasks API documentation refactor (#14492 ) Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-07-11 13:19:39 -07:00
Abhishek Radhakrishnan	854ef98235	Minor doc fixes. (#14565 ) Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2023-07-11 13:12:40 -07:00
YongGang	0ca3ba0b30	Add service/heartbeat metric into statsd-reporter (#14564 )	2023-07-11 12:38:08 -07:00
Nhi Pham	a764ed7fde	Update Jupyter notebook tutorial instructions for ARM devices (#14459 ) Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-07-11 10:01:20 -07:00
dependabot[bot]	c91148c43b	Bump tough-cookie from 4.0.0 to 4.1.3 in /web-console (#14557 ) Bumps [tough-cookie](https://github.com/salesforce/tough-cookie) from 4.0.0 to 4.1.3. - [Release notes](https://github.com/salesforce/tough-cookie/releases) - [Changelog](https://github.com/salesforce/tough-cookie/blob/master/CHANGELOG.md) - [Commits](https://github.com/salesforce/tough-cookie/compare/v4.0.0...v4.1.3) --- updated-dependencies: - dependency-name: tough-cookie dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-07-11 08:53:42 -07:00
Laksh Singla	5ce536355e	Fix planning bug while using sort merge frame processor (#14450 ) sqlJoinAlgorithm is now a hint to the planner to execute the join in the specified manner. The planner can decide to ignore the hint if it deduces that the specified algorithm can be detrimental to the performance of the join beforehand.	2023-07-11 09:58:44 +00:00
Pranav	8087aa2b80	Adding the null check in combine and fold in doublesSketch (#14568 )	2023-07-11 14:28:34 +05:30
Adarsh Sanjeev	30a91be15a	Add log statements for tmpStorageBytes in MSQ (#14449 ) * Add log statements for tmpStorageBytes in MSQ * Add log * Update log message	2023-07-11 11:02:12 +05:30
imply-cheddar	66cac08a52	Refactor HllSketchBuildAggregatorFactory (#14544 ) * Refactor HllSketchBuildAggregatorFactory The usage of ColumnProcessors and HllSketchBuildColumnProcessorFactory made it very difficult to figure out what was going on from just looking at the AggregatorFactory or Aggregator code. It also didn't properly double check that you could use UTF8 ahead of time, even though it's entirely possible to validate it before trying to use it. This refactor makes keeps the general indirection that had been implemented by the Consumer<Supplier<HllSketch>> but centralizes the decision logic and makes it easier to understand the code. * Test fixes * Add test that validates the types are maintained * Add back indirection to avoid buffer calls * Cover floats and doubles are the same thing * Static checks	2023-07-10 09:57:09 -07:00
Tejaswini Bandlamudi	c3f84f9ea0	Suppress CVEs (#14291 ) Address various CVEs by upgrading dependencies or adding suppression with a justification	2023-07-10 15:19:26 +05:30
Kashif Faraz	58a35bf07e	Deprecate EntryExistsException in Druid 27 and remove in Druid 28 (#14554 ) Also deprecate UnknownSegmentIdsException.	2023-07-08 15:40:14 +05:30
Gian Merlino	63ee69b4e8	Claim full support for Java 17. (#14384 ) * Claim full support for Java 17. No production code has changed, except the startup scripts. Changes: 1) Allow Java 17 without DRUID_SKIP_JAVA_CHECK. 2) Include the full list of opens and exports on both Java 11 and 17. 3) Document that Java 17 is both supported and preferred. 4) Switch some tests from Java 11 to 17 to get better coverage on the preferred version. * Doc update. * Update errorprone. * Update docker_build_containers.sh. * Update errorprone in licenses.yaml. * Add some more run-javas. * Additional run-javas. * Update errorprone. * Suppress new errorprone error. * Add exports and opens in ForkingTaskRunner for Java 11+. Test, doc changes. * Additional errorprone updates. * Update for errorprone. * Restore old fomatting in LdapCredentialsValidator. * Copy bin/ too. * Fix Java 15, 17 build line in docker_build_containers.sh. * Update busybox image. * One more java command. * Fix interpolation. * IT commandline refinements. * Switch to busybox 1.34.1-glibc. * POM adjustments, build and test one IT on 17. * Additional debugging. * Fix silly thing. * Adjust command line. * Add exports and opens one more place. * Additional harmonization of strong encapsulation parameters.	2023-07-07 12:52:35 -07:00
Katya Macedo	5f94a2a9c2	Add link to Slack channel (#14553 ) Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-07-07 10:09:15 -07:00
Gian Merlino	021a01df45	RTR, HRTR: Fix incorrect maxLazyWorkers check in markLazyWorkers. (#14545 ) Recently #14532 fixed a problem when maxLazyWorkers == 0 and lazyWorkers starts out empty. Unfortunately, even after that patch, there remained a more general version of this problem when maxLazyWorkers == lazyWorkers.size(). This patch fixes it. I'm not sure if this would actually happen in production, because the provisioning strategies do try to avoid calling markWorkersLazy until previously-initiated terminations have finished. Nevertheless, it still seems like a good thing to fix.	2023-07-07 10:08:12 -07:00
Laksh Singla	9e617373a0	Handle dimensionless group by queries with partitioning	2023-07-07 21:51:47 +05:30
Karan Kumar	afa8c7b8ab	Adding Ability for MSQ to write select results to durable storage. (#14527 ) One of the most requested features in druid is to have an ability to download big result sets. As part of #14416 , we added an ability for MSQ to be queried via a query friendly endpoint. This PR builds upon that work and adds the ability for MSQ to write select results to durable storage. We write the results to the durable storage location <prefix>/results/<queryId> in the druid frame format. This is exposed to users by /v2/sql/statements/:queryId/results.	2023-07-07 20:49:48 +05:30
Kashif Faraz	40d0dc9e0e	Use separate executor to handle task updates in TaskQueue (#14533 ) Description: `TaskQueue.notifyStatus` is often a heavy call as it performs the following operations: - Update task status in metadata DB - Update task locks in metadata DB - Request (synchronously) the task runner to shutdown the completed task - Clean up in-memory data structures This method can often be slow and can cause worker sync / task runners to slow down. Main changes: - Run task completion callbacks in a separate executor to handle task completion updates - Add new config `druid.indexer.queue.taskCompleteHandlerNumThreads` - Add metrics to monitor number of processed and queued items - There are still other paths that can invoke `notifyStatus`, but those need not be moved to the new executor as they are synchronous on purpose. Other changes: - Add new metrics `task/status/queue/count`, `task/status/handled/count` - Add `TaskCountStatsProvider.getStats()` which deprecates the other `getXXXTaskCount` methods. - Use `CoordinatorRunStats` to collect and report metrics. This class has been used as is for now but will later be renamed and repurposed to use across all Druid services.	2023-07-07 20:43:12 +05:30
Jan Werner	95115d722a	CVE fixes - update of multiple dependencies. (#14519 ) Apache Druid brings multiple direct and transitive dependencies that are affected by plethora of CVEs. This PR attempts to update all the dependencies that did not require code refactoring. This PR modifies pom files, license file and OWASP Dependency Check suppression file.	2023-07-07 20:27:30 +05:30
Gian Merlino	1fe61bc869	ChangeRequestHttpSyncer: Don't wait 1ms when checking isInitialized(). (#14547 ) The wait doesn't seem to serve a purpose, other than causing delays when checking isInitialized() for a large number of things that have not yet been initialized.	2023-07-07 05:54:39 -07:00
Kashif Faraz	d63eff3b1b	Reduce contention in HttpRemoteTaskRunner.getKnownTasks() (#14541 )	2023-07-07 13:43:59 +05:30
Gian Merlino	dd78e00dc5	Fix ColumnSignature error message and jdk17 test issue. (#14538 ) * Fix ColumnSignature error message and jdk17 test issue. On jdk17, the "problem" part of the error message could change from NullPointerException to: Cannot invoke "String.length()" because "s" is null Due to the new more-helpful NPEs in Java 17. This broke the expectation and led to test failures on this case. This patch fixes the problem by improving the error message so it isn't a generic NullPointerException. * Fix format.	2023-07-06 15:10:59 -07:00
Gian Merlino	037f09bef2	HttpRemoteTaskRunner: Fix markLazyWorkers for maxLazyWorkers == 0. (#14532 )	2023-07-06 11:51:04 -07:00
Abhishek Radhakrishnan	d02bb8bb6e	Set explain attributes after the query is prepared (#14490 ) * Add support for DML WITH AS. * One more UT for with as subquery. * Add a test with join query * Use root query prepared node instead of individual SqlNode types. - Set the explain plan attributes after the query is prepared when the query is planned and we've the finalized output names in the root source rel node. - Adjust tests; add unit test for negative ordinal case. - Remove the exception / error handling logic from resolveClusteredBy function since the validations now happen before it comes to the function * Update comment.	2023-07-06 14:13:32 -04:00
imply-cheddar	5fc122a144	Add window-focused tests from Drill (#13773 ) This commit borrows some test definitions from Drill's test suite and tries to use them to flesh out the full validation of window function capbilities. In order to be able to run these tests, we also add the ability to run a Scan operation against segments, which also meant an implementation of RowsAndColumns for frames.	2023-07-06 09:20:32 -07:00
imply-cheddar	277b357256	Optimize IntervalIterator (#14530 ) UniformGranularityTest's test to test a large number of intervals runs through 10 years of 1 second intervals. This pushes a lot of stuff through IntervalIterator and shows up in terms of test runtime as one of the hottest tests. Most of the time is going to constructing jodatime objects because it is doing things with DateTime objects instead of millis. Change the calls to use millis instead and things go faster.	2023-07-06 14:44:23 +05:30
Kashif Faraz	87bb1b9709	Fix bug during initialization of HttpServerInventoryView (#14517 ) If a server is removed during `HttpServerInventoryView.serverInventoryInitialized`, the initialization gets stuck as this server is never synced. The method eventually times out (default 250s). Fix: Mark a server as stopped if it is removed. `serverInventoryInitialized` only waits for non-stopped servers to sync. Other changes: - Add new metrics for better debugging of slow broker/coordinator startup - `segment/serverview/sync/healthy`: whether the server view is syncing properly with a server - `segment/serverview/sync/unstableTime`: time for which sync with a server has been unstable - Clean up logging in `HttpServerInventoryView` and `ChangeRequestHttpSyncer` - Minor refactor for readability - Add utility class `Stopwatch` - Add tests and stubs	2023-07-06 13:04:53 +05:30
Kashif Faraz	a6547febaf	Remove unused coordinator dynamic configs (#14524 ) After #13197 , several coordinator configs are now redundant as they are not being used anymore, neither with `smartSegmentLoading` nor otherwise. Changes: - Remove dynamic configs `emitBalancingStats`: balancer error stats are always emitted, debug stats can be logged by using `debugDimensions` - `useBatchedSegmentSampler`, `percentOfSegmentsToConsiderPerMove`: batched segment sampling is always used - Add test to verify deserialization with unknown properties - Update `CoordinatorRunStats` to always track stats, this can be optimized later.	2023-07-06 12:11:10 +05:30
Soumyava	78db7a4414	A query in MSQ would issue wrong error code (#14531 ) with a RuntimeException. Now the RuntimeException is being replaced by an user facing DruidException of Invalid category which would allow calcite not to throw an uncategorized exception.	2023-07-06 08:59:35 +05:30
Jonathan Wei	f29a9faa94	Better surfacing of invalid pattern errors for SQL REGEXP_EXTRACT function (#14505 )	2023-07-05 17:12:54 -05:00
Victoria Lim	50b7e5d20e	docs: fix links (#14504 )	2023-07-05 12:29:47 -07:00
AmatyaAvadhanula	609833c97b	Do not emit negative lag because of stale offsets (#14292 ) The latest topic offsets are polled frequently and used to determine the lag based on the current offsets. However, when the offsets are stale (which can happen due to connection issues commonly), we may see a negative lag . This PR prevents emission of metrics when the offsets are stale and at least one of the partitions has a negative lag.	2023-07-05 14:44:23 +05:30
Jakub Matyszewski	cc159f4317	docs: k8s-jobs role needs batch apigroup (#14343 )	2023-07-04 14:34:20 +05:30
Rishabh Singh	e2676c390e	Downgrade busybox version to fix k8s IT (#14518 )	2023-07-04 12:21:35 +05:30
Nhi Pham	4ee7b14f5f	update links in jupyter notebook (#14404 )	2023-07-03 13:50:25 -07:00
Tejaswini Bandlamudi	c04a36d15b	Run IntelliJ-inspections in parallel to static-checks & web-checks in GHA (#14515 ) Currently, IntelliJ-inspections are run sequentially w.r.t static-checks, thereby increasing build time. Moving IntelliJ-inspections to a separate job to improve builds time and get a quick insight into such issues early on.	2023-07-03 17:10:19 +05:30
Adarsh Sanjeev	27a70d569d	Add page information to SqlStatementResource API (#14512 ) * Changes the get results API in SqlStatementResource to take a page number instead of row/offset. * Adds "pages" containing information on each page to the results status. * Update the "numRows" and "sizeInByes" to "numTotalRows" and "totalSizeInBytes" respectively, which are totalled across all pages.	2023-07-03 15:20:14 +05:30
Pranav	2d5b27358e	Logging the fieldName in the coerce exceptions (#14483 ) Logging the fieldName in the coerce exceptions	2023-07-03 14:13:27 +05:30
Clint Wylie	277aaa5c57	remove druid.processing.columnCache.sizeBytes and CachingIndexed, combine string column implementations (#14500 ) * combine string column implementations changes: * generic indexed, front-coded, and auto string columns now all share the same column and index supplier implementations * remove CachingIndexed implementation, which I think is largely no longer needed by the switch of many things to directly using ByteBuffer, avoiding the cost of creating Strings * remove ColumnConfig.columnCacheSizeBytes since CachingIndexed was the only user	2023-07-02 19:37:15 -07:00
Gian Merlino	58f3faf299	SortMergeJoinFrameProcessor: Fix two bugs with buffering. (#14196 ) 1) Fix a problem where the fault wasn't reported when the left-hand side had too many buffered frames. (Instead, frames continued to be buffered, eventually running the server out of memory.) 2) Always update the mark when rewinding isn't necessary. It fixes a problem where frames would be needlessly buffered when there isn't a key match across the two sides. 3) Memory reserved for building the trackers now change based on the heap sized	2023-07-02 19:52:52 +05:30
Gian Merlino	048dbcee88	MSQ: Improve InsertTimeOutOfBounds error message. (#14511 ) Nicer and actionable error message for `InsertTimeOutOfBounds` fault	2023-07-02 01:44:19 +05:30
Gian Merlino	67fbd8e7fc	Add "stringEncoding" parameter to DataSketches HLL. (#11201 ) * Add "stringEncoding" parameter to DataSketches HLL. Builds on the concept from #11172 and adds a way to feed HLL sketches with UTF-8 bytes. This must be an option rather than always-on, because prior to this patch, HLL sketches used UTF-16LE encoding when hashing strings. To remain compatible with sketch images created prior to this patch -- which matters during rolling updates and when reading sketches that have been written to segments -- we must keep UTF-16LE as the default. Not currently documented, because I'm not yet sure how best to expose this functionality to users. I think the first place would be in the SQL layer: we could have it automatically select UTF-8 or UTF-16LE when building sketches at query time. We need to be careful about this, though, because UTF-8 isn't always faster. Sometimes, like for the results of expressions, UTF-16LE is faster. I expect we will sort this out in future patches. * Fix benchmark. * Fix style issues, improve test coverage. * Put round back, to make IT updates easier. * Fix test. * Fix issue with filtered aggregators and add test. * Use DS native update(ByteBuffer) method. Improve test coverage. * Add another suppression. * Fix ITAutoCompactionTest. * Update benchmarks. * Updates. * Fix conflict. * Adjustments.	2023-06-30 12:45:55 -07:00
Pranav	4b2d87336a	Add additional index on task table (#14470 )	2023-06-29 15:32:43 -07:00
Gian Merlino	e10e35aa2c	Add REGEXP_REPLACE function. (#14460 ) * Add REGEXP_REPLACE function. Replaces all instances of a pattern with a replacement string. * Fixes. * Improve test coverage. * Adjust behavior.	2023-06-29 13:47:57 -07:00
Gian Merlino	a6cabbe10f	SQL: Avoid "intervals" for non-table-based datasources. (#14336 ) In these other cases, stick to plain "filter". This simplifies lots of logic downstream, and doesn't hurt since we don't have intervals-specific optimizations outside of tables. Fixes an issue where we couldn't properly filter on a column from an external datasource if it was named __time.	2023-06-29 09:57:11 +05:30
Gian Merlino	c798d3fb2e	Fix flaky SqlStatementResourceTest. (#14498 ) Mocks generally have state and should not be static. In particular, the "Yielder" included in one of the mocks can only be iterated once, which made the test suite order-dependent.	2023-06-29 05:42:44 +05:30
Gian Merlino	fd1a88a6b3	.asf.yaml: Add required "repository" field. (#14499 ) Our new .asf.yaml is not getting picked up because custom_subjects must include "repository".	2023-06-28 15:05:07 -07:00

... 5 6 7 8 9 ...

13221 Commits All Branches Search

13221 Commits

All Branches