This PR generally improves the working of WriteOutBytes and WriteOutMedium. Some analysis of usage of TmpFileSegmentWriteOutMedium shows that they periodically get used for very small things. The overhead of creating a tmp file is actually very large. To improve the performance in these cases, this PR modifies TmpFileSegmentWriteOutMedium to return a heap-based WriteOutBytes that falls back to making a tmp file when it actually fills up.
---------
Co-authored-by: imply-cheddar <eric.tschetter@imply.io>
* just starting
* TIME_PARSE and TIME_FORMAT remaining
* fixing typo
* adding last two functions
* review sql-functions.md
* Apply suggestions from code review
Suggestions that were accepted as is
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
* Update docs/querying/sql-functions.md
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
* Update docs/querying/sql-functions.md
needed to confirm that it did indeed return as a number
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
* reviewing remaining suggestions
* addressing review for time_format
* Apply suggestions from code review
Accepted as is
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
* addressing final suggestion
* time_zone -> timezone
* timezone fix
---------
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
* Updated auth to use variables with default values
* Update docs/api-reference/sql-ingestion-api.md
* Remove Python auth entirely as its not being used
---------
Co-authored-by: Benedict Jin <asdf2014@apache.org>
Previously, SeekableStreamIndexTaskRunner set ingestion state to
COMPLETED when it finished reading data from Kafka. This is incorrect.
After the changes in this patch, the transitions go:
1) The task stays in BUILD_SEGMENTS after it finishes reading from Kafka,
while it is building its final set of segments to publish.
2) The task transitions to SEGMENT_AVAILABILITY_WAIT after publishing,
while waiting for handoff.
3) The task transitions to COMPLETED immediately before exiting, when
truly done.
* Lower,Upper,Lpad,Rpad,Parse_long
* up to REGEXP_EXTRACT
* batch 07 ready for review
* updated definitions in scalar
* Apply suggestions from code review
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* rpad and lpad
* addressing comments
* minor fixes
* improving examples based on suggestions
* matched -> matches
* correcting typo
* Apply suggestions from code review
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
---------
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Add type coercion and null check to left, right, repeat exprs.
These exprs shouldn't validate types; they should coerce types. Coercion
is typical behavior for functions because it enables schema evolution.
The functions are also modified to check isNumericNull on the right-hand
argument. This was missing previously, which would erroneously cause
nulls to be treated as zeroes.
* Fix tests.
The specific error on a truncated file can vary based on how the final
frame of the truncated file is written. This patch loosens the check so
it passes regardless of how the truncated file is written.
A follow-up PR for #16864. Just renames dimensionToSchemaMap to dimensionSchemas and always overrides ARRAY_INGEST_MODE context value to array for MSQ compaction.
Reduction of nullable DATE and TIMESTAMP expressions did not perform
a necessary null check, so would in some cases reduce to
1970-01-01 00:00:00 (epoch) rather than NULL.
changes:
* Added `CursorBuildSpec` which captures all of the 'interesting' stuff that goes into producing a cursor as a replacement for the method arguments of `CursorFactory.canVectorize`, `CursorFactory.makeCursor`, and `CursorFactory.makeVectorCursor`
* added new interface `CursorHolder` and new interface `CursorHolderFactory` as a replacement for `CursorFactory`, with method `makeCursorHolder`, which takes a `CursorBuildSpec` as an argument and replaces `CursorFactory.canVectorize`, `CursorFactory.makeCursor`, and `CursorFactory.makeVectorCursor`
* `CursorFactory.makeCursors` previously returned a `Sequence<Cursor>` corresponding to the query granularity buckets, with a separate `Cursor` per bucket. `CursorHolder.asCursor` instead returns a single `Cursor` (equivalent to 'ALL' granularity), and a new `CursorGranularizer` has been added for query engines to iterate over the cursor and divide into granularity buckets. This makes the non-vectorized engine behave the same way as the vectorized query engine (with its `VectorCursorGranularizer`), and simplifies a lot of stuff that has to read segments particularly if it does not care about bucketing the results into granularities.
* Deprecated `CursorFactory`, `CursorFactory.canVectorize`, `CursorFactory.makeCursors`, and `CursorFactory.makeVectorCursor`
* updated all `StorageAdapter` implementations to implement `makeCursorHolder`, transitioned direct `CursorFactory` implementations to instead implement `CursorMakerFactory`. `StorageAdapter` being a `CursorMakerFactory` is intended to be a transitional thing, ideally will not be released in favor of moving `CursorMakerFactory` to be fetched directly from `Segment`, however this PR was already large enough so this will be done in a follow-up.
* updated all query engines to use `makeCursorHolder`, granularity based engines to use `CursorGranularizer`.
Upgrade/Downgrade between any version till or before Druid 30 where the newer version runs a worker task, while the older version runs a controller task can fail. The patch removes that verification check till its safe to add it back.
Handle the following Delta complex types:
a. StructType as JSON
b. ArrayType as Java list
c. MapType as Java map
Generate and add a new Delta table complex-types-table that contains the above complex types for testing.
Update the tests to include a parameterized test with complex-types-table, with the expectations defined in ComplexTypesDeltaTable.java.
* Fix build
* Run coldSchemaExec thread periodically
* Bugfix: Run cold schema refresh periodically
* Rename metrics for deep storage only segment schema process
* Fix typo in waitUntilSegmentsLoad.
* Add a note on configuring druid.segmentCache.locations for broadcast rules.
* Update docs/operations/rule-configuration.md
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
---------
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
* Use fuzzy matchers for compaction bytes asserts.
This still enables us to test that the bytes are zero and nonzero
when they're supposed to be, without having to ge them exactly
right. The need to get bytes exactly right makes it difficult to
ensure ITs pass when making changes to default segment metadata.
* Additional fuzziness.
Bug description:
Peons to fail to start up when `WorkerTaskCountStatsMonitor` is used on MiddleManagers.
This is because MiddleManagers pass on their properties to peons and peons are unable to
find `IndexerTaskCountStatsProvider` as that is bound only for indexer nodes.
Fix:
Check if node is an indexer before trying to get instance of `IndexerTaskCountStatsProvider`.
Fixes#13936
In cases where a supervisor is idle and the overlord is restarted for some reason, the supervisor would
start spinning tasks again. In clusters where there are many low throughput streams, this would spike
the task count unnecessarily.
This commit compares the latest stream offset with the ones in metadata during the startup of supervisor
and sets it to idle state if they match.
* SQL syntax error should target USER persona
* * revert change to queryHandler and related tests, based on review comments
* * add test
* Properly handle Druid schema blending with catalog definition and segment metadata
* * add javadocs