druid

Commit Graph

Author	SHA1	Message	Date
Gian Merlino	e76a5077ef	Fix self-referential shape inspection in BaseExpressionColumnValueSelector. (#12669 ) * Fix self-referential shape inspection in BaseExpressionColumnValueSelector. The new test would throw StackOverflowError on the old code. * Restore prior test.	2022-06-17 16:15:50 -07:00
Clint Wylie	18937ffee2	split out null value index (#12627 ) * split out null value index * gg spotbugs * fix stuff	2022-06-17 15:29:23 -07:00
Paul Rogers	893759de91	Remove null and empty fields from native queries (#12634 ) * Remove null and empty fields from native queries * Test fixes * Attempted IT fix. * Revisions from review comments * Build fixes resulting from changes suggested by reviews * IT fix for changed segment size	2022-06-16 14:07:25 -07:00
Jill Osborne	f050069767	Segments doc update (#12344 ) * Corrected heading levels in segments doc * IMPLY-18394: Updated Segments doc * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update segments.md * Updated links to changed headings in Segments doc * Corrected spelling error * Update segments.md Incorporated suggestions from Paul Rogers. * Update index.md * Update segments.md * Update segments.md * Update segments.md * Update compaction.md * Update docs/design/segments.md fix typo * Update docs/ingestion/compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-06-16 13:25:17 -07:00
AmatyaAvadhanula	f970757efc	Optimize overlord GET /tasks memory usage (#12404 ) The web-console (indirectly) calls the Overlord’s GET tasks API to fetch the tasks' summary which in turn queries the metadata tasks table. This query tries to fetch several columns, including payload, of all the rows at once. This introduces a significant memory overhead and can cause unresponsiveness or overlord failure when the ingestion tab is opened multiple times (due to several parallel calls to this API) Another thing to note is that the task table (the payload column in particular) can be very large. Extracting large payloads from such tables can be very slow, leading to slow UI. While we are fixing the memory pressure in the overlord, we can also fix the slowness in UI caused by fetching large payloads from the table. Fetching large payloads also puts pressure on the metadata store as reported in the community (Metadata store query performance degrades as the tasks in druid_tasks table grows · Issue #12318 · apache/druid ) The task summaries returned as a response for the API are several times smaller and can fit comfortably in memory. So, there is an opportunity here to fix the memory usage, slow ingestion, and under-pressure metadata store by removing the need to handle large payloads in every layer we can. Of course, the solution becomes complex as we try to fix more layers. With that in mind, this page captures two approaches. They vary in complexity and also in the degree to which they fix the aforementioned problems.	2022-06-16 22:30:37 +05:30
Lucas Capistrant	602d95d865	Add a builder class for TestDruidCoordinatorConfig (#12624 ) * Add a builder class for TestDruidCoordinatorConfig * updates after review * Fix formatting	2022-06-16 09:11:31 -05:00
Victoria Lim	94564b6ce6	Update screenshots for Druid console doc (#12593 ) * druid console doc updates * remove extra image * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> * updated screenshot labels Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-06-15 16:42:20 -07:00
Gian Merlino	70f3b13621	ForkingTaskRunner: Set ActiveProcessorCount for tasks. (#12592 ) * ForkingTaskRunner: Set ActiveProcessorCount for tasks. This prevents various automatically-sized thread pools from being unreasonably large (we don't want each task to size its pools as if it is the only thing on the entire machine). * Fix tests. * Add missing LifecycleStart annotation. * ForkingTaskRunner needs ManageLifecycle.	2022-06-15 15:56:32 -07:00
Paul Rogers	45e3111549	Clean up query contexts (#12633 ) * Clean up query contexts Uses constants in place of literal strings for context keys. Moves some QueryContext methods to QueryContexts for reuse. * Revisions from review comments	2022-06-15 11:31:22 -07:00
Rohan Garg	28f2c8e112	Support LoadScope for Peons + Access Modifier Updates (#12640 ) * Support LoadScope for Peons * Update access modifiers for GroupByEngineV2	2022-06-14 21:52:50 -07:00
Gian Merlino	283249c51b	NettyHttpClient: Fix double-return on certain exceptions. (#12626 ) The "exceptionCaught" handler may get called multiple times. We should only return the channel to the pool the first time. Returning it more than once leads to a warning like "Resource at key[%s] was returned multiple times?"	2022-06-14 21:40:47 -07:00
Gian Merlino	1f6e888472	Add QoSFilters first in the chain. (#12625 ) * Add QoSFilters first in the chain. When a request is suspended and later resumed due to QoS constraints, its filter chain is restarted. Placing QoSFilters first in the chain avoids double-execution of other filters. Fixes an issue where requests deferred by QoS would report 403 Forbidden due to double-execution of SecuritySanityCheckFilter. * Smaller changes. * Add QoS filters in BaseJettyTest. * Remove unused parameter.	2022-06-14 13:37:00 -07:00
Gian Merlino	ceb4ace118	NettyHttpClient: Replace ReadTimeoutException with our own exception. (#12635 ) * NettyHttpClient: Replace ReadTimeoutException with our own exception. * Replace exception with same type. * Remove unused import.	2022-06-14 13:34:46 -07:00
Vadim Ogievetsky	6f7fa334fd	Web console: totalNumMergeTasks can be set on range also (#12648 ) * totalNumMergeTasks can be set on range also * fix formatting	2022-06-14 11:18:17 -07:00
Atul Mohan	68bae6eafb	Fix version in master (#12644 )	2022-06-14 11:32:46 +05:30
Rohan Garg	afaea251f2	Push join build table values as filter incase of duplicates (#12225 ) * Push join build table values as filter * Add tests for JoinableFactoryWrapper * fixup! Push join build table values as filter * fixup! Add tests for JoinableFactoryWrapper * fixup! Push join build table values as filter	2022-06-13 17:18:27 -07:00
317brian	27e8b43673	fix: update footer copyright year (#12594 )	2022-06-13 16:29:58 -07:00
Gian Merlino	1ace7336cd	Update node to 14.19.3. (#12632 )	2022-06-10 10:18:12 -07:00
Victoria Lim	353475bd36	Docs for automatic compaction (#12569 ) * docs for auto-compaction * fix broken links * another link * Apply suggestions from code review Co-authored-by: Suneet Saldanha <suneet@apache.org> * Apply suggestions from code review Co-authored-by: Suneet Saldanha <suneet@apache.org> * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Suneet Saldanha <suneet@apache.org> * reorg content for skipOffset * Update docs/ingestion/automatic-compaction.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Apply suggestions from code review Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> Co-authored-by: Suneet Saldanha <suneet@apache.org> Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2022-06-09 14:55:12 -07:00
TSFenwick	a3603ad6b0	Use DefaultQueryConfig in SqlLifecycle to correctly populate request logs (#12613 ) Fixes an issue where sql query request logs do not include the default query context values set via `druid.query.default.context.xyz` runtime properties. # Change summary * Inject `DefaultQueryConfig` into `SqlLifecycleFactory` * Add params from `DefaultQueryConfig` to the query context in `SqlLifecycle` # Description - This change does not affect query execution. This is because the `DefaultQueryConfig` was already being used in `QueryLifecycle`, which is initialized when the SQL is translated to a native query. - This also handles any potential use case where a context parameter should be handled at the SQL stage itself.	2022-06-08 12:52:50 +05:30
Gian Merlino	8fbf92e047	SqlSegmentsMetadataQuery: Fix OVERLAPS for wide target segments. (#12600 ) * SqlSegmentsMetadataQuery: Fix OVERLAPS for wide target segments. Segments with endpoints prior to year 0 or after year 9999 may overlap the search intervals but not match the generated SQL conditions. So, we need to add an additional OR condition to catch these. I checked a real, live MySQL metadata store to confirm that the query still uses metadata store indexes. It does. * Add comments.	2022-06-07 11:33:46 -07:00
Abhishek Agarwal	59a0c10c47	Add remedial information in error message when type is unknown (#12612 ) Often users are submitting queries, and ingestion specs that work only if the relevant extension is not loaded. However, the error is too technical for the users and doesn't suggest them to check for missing extensions. This PR modifies the error message so users can at least check their settings before assuming that the error is because of a bug.	2022-06-07 20:22:45 +05:30
Laksh Singla	81c37c6515	Add validation for invalid partitioned by granularities (#12589 ) * Add validation for invalid partitioned by granularities * review comments * improve error message, change location of the method * remove imports * use StringUtils.lowercase Co-authored-by: Adarsh Sanjeev <adarshsanjeev@gmail.com>	2022-06-06 22:00:29 +05:30
Adarsh Sanjeev	5a283964ca	Improve SQL validation error messages (#12611 ) Update the SQL validation error message to specify whether the ingest is INSERT or REPLACE for better user experience.	2022-06-06 16:14:28 +05:30
Gian Merlino	abf0e0a159	CompressionStrategyTest: Fix thread-unsafe Closer usage. (#12605 ) Closer is not thread-safe, so we need one per thread in the concurrency tests.	2022-06-04 10:57:13 -07:00
Gian Merlino	a503683a4a	Add caching and CSP response headers. (#12609 ) * Add caching and CSP response headers. * Fix tests. * Fix checkstyle issues Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2022-06-04 21:46:49 +05:30
Victoria Lim	1506b26ce4	fix typo (#12607 )	2022-06-04 13:14:18 +08:00
Gian Merlino	a27f4f5740	Service stdout log files, move logs to log/. (#12570 ) * Service stdout log files, move logs to log/. Two changes that make log behavior cleaner: 1) Redirect messages from the Java runtime to their own log files. Otherwise, they would get jumbled up in the output of the all-in-one start command. 2) Use log/ instead of bin/log/ for the default log directory. Makes them easier to find. Additionally, add documentation about how to avoid the reflective access warnings in Java 11. * Spelling. * See if code formatting affects spelling.	2022-06-03 10:44:29 +05:30
Jill Osborne	9c8e6bb000	Addition to Multitenancy considerations doc (#12567 ) * Small addition to Multitenancy considerations doc * Update docs/querying/multitenancy.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update multitenancy.md Edit suggested by @kfaraz Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-06-02 10:32:14 -07:00
dependabot[bot]	4558b815e5	Bump eventsource from 1.1.0 to 1.1.1 in /web-console (#12595 ) Bumps [eventsource](https://github.com/EventSource/eventsource) from 1.1.0 to 1.1.1. - [Release notes](https://github.com/EventSource/eventsource/releases) - [Changelog](https://github.com/EventSource/eventsource/blob/master/HISTORY.md) - [Commits](https://github.com/EventSource/eventsource/compare/v1.1.0...v1.1.1) --- updated-dependencies: - dependency-name: eventsource dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-01 22:04:30 -07:00
dependabot[bot]	c49277bd2b	Bump eventsource from 1.0.7 to 1.1.1 in /website (#12596 ) Bumps [eventsource](https://github.com/EventSource/eventsource) from 1.0.7 to 1.1.1. - [Release notes](https://github.com/EventSource/eventsource/releases) - [Changelog](https://github.com/EventSource/eventsource/blob/master/HISTORY.md) - [Commits](https://github.com/EventSource/eventsource/compare/v1.0.7...v1.1.1) --- updated-dependencies: - dependency-name: eventsource dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-01 22:04:04 -07:00
Clint Wylie	98f6bca2cd	fix regression with ipv4_match and prefixes (#12542 ) * fix issue with ipv4_match and prefixes	2022-06-01 14:03:08 -07:00
dependabot[bot]	23b9a6f9eb	Bump lodash from 4.17.15 to 4.17.21 in /website (#12409 ) Bumps [lodash](https://github.com/lodash/lodash) from 4.17.15 to 4.17.21. - [Release notes](https://github.com/lodash/lodash/releases) - [Commits](https://github.com/lodash/lodash/compare/4.17.15...4.17.21) --- updated-dependencies: - dependency-name: lodash dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-01 13:56:22 -07:00
dependabot[bot]	86d01b3681	Bump opentelemetry-instrumentation-bom-alpha (#12531 ) Bumps [opentelemetry-instrumentation-bom-alpha](https://github.com/open-telemetry/opentelemetry-java-instrumentation) from 1.7.0-alpha to 1.14.0-alpha. - [Release notes](https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/CHANGELOG.md) - [Commits](https://github.com/open-telemetry/opentelemetry-java-instrumentation/commits) --- updated-dependencies: - dependency-name: io.opentelemetry.instrumentation:opentelemetry-instrumentation-bom-alpha dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-01 13:51:39 -07:00
Clint Wylie	31f988ec76	fix backwards compatibility for explicit null columns (#12585 )	2022-06-01 12:39:48 -07:00
AmatyaAvadhanula	f7ce73eee7	Suppress CVEs (#12590 )	2022-06-01 21:22:32 +05:30
Clint Wylie	dc0fdfec67	fix test comment (#12584 )	2022-05-31 12:39:20 -07:00
Clint Wylie	0640c9c9ac	fix compression-strategy-test (#12575 ) fixes an issue caused by a test modification in #12408 that was closing buffers allocated by the compression strategy instead of allowing the closer to do it	2022-05-31 11:48:32 -07:00
Gian Merlino	02ae3e74ff	RowBasedColumnSelectorFactory: Add "useStringValueOfNullInLists" parameter. (#12578 ) RowBasedColumnSelectorFactory inherited strange behavior from Rows.objectToStrings for nulls that appear in lists: instead of being left as a null, it is replaced with the string "null". Some callers may need compatibility with this strange behavior, but it should be opt-in. Query-time call sites are changed to opt-out of this behavior, since it is not consistent with query-time expectations. The IncrementalIndex ingestion-time call site retains the old behavior, as this is traditionally when Rows.objectToStrings would be used.	2022-05-31 11:38:56 -07:00
Gian Merlino	b639298f6e	CompressionUtils: Increase gzip buffer size. (#12579 )	2022-05-31 11:38:13 -07:00
Gian Merlino	6d2ff796a3	Add RowIdSupplier to ColumnSelectorFactory. (#12577 ) * Add RowIdSupplier to ColumnSelectorFactory. This enables virtual columns to cache their outputs in case they are called multiple times on the same underlying row. This is common for numeric selectors, where the common pattern is to call isNull() and then follow with getLong(), getFloat(), or getDouble(). Here, output caching reduces the number of expression evals by half. * Fix tests.	2022-05-31 11:38:03 -07:00
Clint Wylie	b746bf9129	fix virtual column cycle bug, sql virtual column optimize bug (#12576 ) * fix virtual column cycle bug, sql virtual column optimize bug * more test	2022-05-30 23:51:21 -07:00
Dr. Sizzles	7291c92f4f	Adding zstandard compression library (#12408 ) * Adding zstandard compression library * 1. Took @clintropolis's advice to have ZStandard decompressor use the byte array when the buffers are not direct. 2. Cleaned up checkstyle issues. * Fixing zstandard version to latest stable version in pom's and updating license files * Removing zstd from benchmarks and adding to processing (poms) * fix the intellij inspection issue * Removing the prefix v for the version in the license check for ztsd * Fixing license checks Co-authored-by: Rahul Gidwani <r_gidwani@apple.com>	2022-05-28 17:01:44 -07:00
Dongjoon Hyun	79f86a0511	Upgrade ORC to 1.7.4 (#12572 ) This commit upgrades Apache ORC library from 1.7.2 to 1.7.4. Apache ORC 1.7.4 is the maintenance release with the following bug fixes. https://orc.apache.org/news/2022/04/15/ORC-1.7.4/ https://github.com/apache/orc/releases/tag/v1.7.4	2022-05-28 17:44:36 +05:30
Clint Wylie	d0c9c37e35	make query context changes backwards compatible (#12564 ) Adds a default implementation of getQueryContext, which was added to the Query interface in #12396. Query is marked with @ExtensionPoint, and lately we have been trying to be less volatile on these interfaces by providing default implementations to be more chill for extension writers. The way this default implementation is done in this PR is a bit strange due to the way that getQueryContext is used (mutated with system default and system generated keys); the default implementation has a specific object that it returns, and I added another temporary default method isLegacyContext that checks if the getQueryContext returns that object or not. If not, callers fall back to using getContext and withOverriddenContext to set these default and system values. I am open to other ideas as well, but this way should work at least without exploding, and added some tests to ensure that it is wired up correctly for QueryLifecycle, including the context authorization stuff. The added test shows the strange behavior if query context authorization is enabled, mainly that the system default and system generated query context keys also need to be granted as permissions for things to function correctly. This is not great, so I mentioned it in the javadocs as well. Not sure if it needs to be called out anywhere else.	2022-05-25 15:24:41 +05:30
Karan Kumar	9f9faeec81	object[] handling for DimensionHandlers for arrays (#12552 ) Description Fixes a bug when running q's like SELECT cntarray, Count() FROM (SELECT dim1, dim2, Array_agg(cnt) AS cntarray FROM (SELECT dim1, dim2, dim3, Count() AS cnt FROM foo GROUP BY 1, 2, 3) GROUP BY 1, 2) GROUP BY 1 This generates an error: org.apache.druid.java.util.common.ISE: Unable to convert type [Ljava.lang.Object; to org.apache.druid.segment.data.ComparableList at org.apache.druid.segment.DimensionHandlerUtils.convertToList(DimensionHandlerUtils.java:405) ~[druid-xx] Because it's an array of numbers it looks like it does the convertToList call, which looks like: @Nullable public static ComparableList convertToList(Object obj) { if (obj == null) { return null; } if (obj instanceof List) { return new ComparableList((List) obj); } if (obj instanceof ComparableList) { return (ComparableList) obj; } throw new ISE("Unable to convert type %s to %s", obj.getClass().getName(), ComparableList.class.getName()); } I.e. it doesn't know about arrays. Added the array handling as part of this PR.	2022-05-25 15:24:18 +05:30
Abhishek Agarwal	b10eb4cbd4	Suppress false CVE on druid-indexing-hadoop artifact (#12562 )	2022-05-24 16:00:58 +05:30
Abhishek Agarwal	32fe4d1324	Use a different repository to download sigar artifacts. (#12561 )	2022-05-24 14:42:51 +05:30
Agustin Gonzalez	2f3d7a4c07	Emit state of replace and append for native batch tasks (#12488 ) * Emit state of replace and append for native batch tasks * Emit count of one depending on batch ingestion mode (APPEND, OVERWRITE, REPLACE) * Add metric to compaction job * Avoid null ptr exc when null emitter * Coverage * Emit tombstone & segment counts * Tasks need a type * Spelling * Integrate BatchIngestionMode in batch ingestion tasks functionality * Typos * Remove batch ingestion type from metric since it is already in a dimension. Move IngestionMode to AbstractTask to facilitate having mode as a dimension. Add metrics to streaming. Add missing coverage. * Avoid inner class referenced by sub-class inspection. Refactor computation of IngestionMode to make it more robust to null IOConfig and fix test. * Spelling * Avoid polluting the Task interface * Rename computeCompaction methods to avoid ambiguous java compiler error if they are passed null. Other minor cleanup.	2022-05-23 12:32:47 -07:00
Adarsh Sanjeev	5063eca5b9	Add error message for incorrectly ordered clause in sql (#12558 ) In the case that the clustered by is before the partitioned by for an sql query, the error message is a bit confusing. insert into foo select * from bar clustered by dim1 partitioned by all Error: SQL parse failed Encountered "PARTITIONED" at line 1, column 88. Was expecting one of: <EOF> "," ... "ASC" ... "DESC" ... "NULLS" ... "." ... "NOT" ... "IN" ... "<" ... "<=" ... ">" ... ">=" ... "=" ... "<>" ... "!=" ... "BETWEEN" ... "LIKE" ... "SIMILAR" ... "+" ... "-" ... "*" ... "/" ... "%" ... "\|\|" ... "AND" ... "OR" ... "IS" ... "MEMBER" ... "SUBMULTISET" ... "CONTAINS" ... "OVERLAPS" ... "EQUALS" ... "PRECEDES" ... "SUCCEEDS" ... "IMMEDIATELY" ... "MULTISET" ... "[" ... "FORMAT" ... "(" ... Less... org.apache.calcite.sql.parser.SqlParseException This is a bit confusing and adding a check could be added to throw a more user friendly message stating that the order should be reversed. Add error message for incorrectly ordered clause in sql.	2022-05-23 12:41:18 +05:30

... 2 3 4 5 6 ...

11945 Commits All Branches Search

11945 Commits

All Branches