* Docs: improve druid.coordinator.kill.on description
* Update docs/configuration/index.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update description for durationToRetain
* Update docs/configuration/index.md
* Update after review
---------
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* [Docs] Improve Bloom filter topic
* Apply suggestions from code review
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update spelling file
---------
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
This change is to emit following metrics as part of GroupByStatsMonitor monitor,
mergeBuffer/used -> Number of merge buffers used.
mergeBuffer/acquisitionTimeNs -> Total time required to acquire merge buffer.
mergeBuffer/acquisition -> Number of queries that acquired a batch of merge buffers.
groupBy/spilledQueries -> Number of queries that spilled onto the disk.
groupBy/spilledBytes-> Spilled bytes on the disk.
groupBy/mergeDictionarySize -> Size of the merging dictionary.
All JDK 8 based CI checks have been removed.
Images used in Dockerfile(s) have been updated to Java 17 based images.
Documentation has been updated accordingly.
* SQL syntax error should target USER persona
* * revert change to queryHandler and related tests, based on review comments
* * add test
* Add documentation for druid-catalog extension
* * fix error
* * fix error
* Apply suggestions from code review
Co-authored-by: Andreas Maechler <amaechler@gmail.com>
* * fix spelling error
* * fix spelling
---------
Co-authored-by: Andreas Maechler <amaechler@gmail.com>
* Add a wait on start() for task lifecycle to go into running
* handle exceptions
* Fix logging messages
* Don't pass in the settable future as a arg
* add some unit tests
Add build instructions for developers
Follow up from issue #17375, add instructions solely for distribution profile. Note that this build command is mostly used by me, everyone is welcome to add further optimizations for a faster distribution build.
Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>
* Update docs/development/build.md
Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>
* Update docs/development/build.md
Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>
---------
Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>
Map Lookup Introspection API endpoints /keys and /values no longer return an invalid JSON object.
Also, update documentation to clarify the version returned by the /version introspection endpoint.
---------
Co-authored-by: Ashwin Tumma <ashwin.tumma@salesforce.com>
Currently, durable storage and export both require configuring a temporary directory to be used using druid.export.storage.<connectorType>.tempLocalDir and druid.msq.intermediate.storage.tempDir.
Tasks on middle manager already have a configured temporary directory. This PR aims to reduce the configuration required by using the task directory as a default if it is not explicitly configured, thus reducing the number of configs that a user has to set.
Please note that preference would be given to the user configured, druid.*.storage.temp*Dir, on the tasks. If that is not configured, we then use the configured temporary directory.
Overlord and brokers also require storage connector configurations (for the durableStorageCleanerOverlordDuty and to fetch results of async queries respectively), but do not have a default temporary task directory. The configuration is still required for these services.
* from start to step 3 of Ingest data using Theta sketche
* updated upto "Query the Theta sketch column"
* fixed sentence
* another typo
* using sql ingestion instead of batch-sql
* waiting for explanations on DS_THETA
* Revert "using sql ingestion instead of batch-sql"
This reverts commit b95fcb9b32.
* Revert "using sql ingestion instead of batch-sql"
This reverts commit b95fcb9b32.
* just copy and pasting to where I was
* updated tutorial
* fixing images, and removing unused
* slightly updating explanatio
* Update docs/tutorials/tutorial-sketches-theta.md
* Apply suggestions from code review
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
* addressing comments in review
* made filter clause consitent with other instances
* Apply suggestions from code review
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
---------
Co-authored-by: Benedict Jin <asdf2014@apache.org>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
changes:
adds ExpressionProcessing.allowVectorizeFallback() and ExpressionProcessingConfig.allowVectorizeFallback(), defaulting to false until few remaining bugs can be fixed (mostly complex types and some odd interactions with mixed types)
add cannotVectorizeUnlessFallback functions to make it easy to toggle the default of this config, and easy to know what to delete when we remove it in the future
Text-based input formats like csv and tsv currently parse inputs only as strings, following the RFC4180Parser spec).
To workaround this, the web-console and other tools need to further inspect the sample data returned to sample data returned by the Druid sampler API to parse them as numbers.
This patch introduces a new optional config, tryParseNumbers, for the csv and tsv input formats. If enabled, any numbers present in the input will be parsed in the following manner -- long data type for integer types and double for floating-point numbers, and if parsing fails for whatever reason, the input is treated as a string. By default, this configuration is set to false, so numeric strings will be treated as strings.
Implements threshold based automatic query prioritization using the time period of the actual segments scanned. This differs from the current implementation of durationThreshold which uses the duration in the user supplied query. There are some usability constraints with using durationThreshold from the user supplied query, especially when using SQL. For example, if a client does not explicitly specify both start and end timestamps then the duration is extremely large and will always exceed the configured durationThreshold. This is one example interval from a query that specifies no end timestamp:
"interval":["2024-08-30T08:05:41.944Z/146140482-04-24T15:36:27.903Z"]. This interval is generated from a query like SELECT * FROM table WHERE __time > CURRENT_TIMESTAMP - INTERVAL '15' HOUR. Using the time period of the actual segments scanned allows proper prioritization without explicitly having to specify start and end timestamps. This PR adds onto #9493