druid

Commit Graph

Author	SHA1	Message	Date
imply-cheddar	089d8da561	Support Framing for Window Aggregations (#13514 ) * Support Framing for Window Aggregations This adds support for framing over ROWS for window aggregations. Still not implemented as yet: 1. RANGE frames 2. Multiple different frames in the same query 3. Frames on last/first functions	2022-12-14 18:04:39 -08:00
Vadim Ogievetsky	2729e25295	Link to java docs (#13478 ) * add link to page about selecting a JRE * add link to script also * simplify text	2022-12-14 11:45:23 -08:00
Rohan Garg	35c983a351	Use template file for adding table functions grammar (#13553 )	2022-12-14 21:52:09 +05:30
Kashif Faraz	58a3acc2c4	Add InputStats to track bytes processed by a task (#13520 ) This commit adds a new class `InputStats` to track the total bytes processed by a task. The field `processedBytes` is published in task reports along with other row stats. Major changes: - Add class `InputStats` to track processed bytes - Add method `InputSourceReader.read(InputStats)` to read input rows while counting bytes. > Since we need to count the bytes, we could not just have a wrapper around `InputSourceReader` or `InputEntityReader` (the way `CountableInputSourceReader` does) because the `InputSourceReader` only deals with `InputRow`s and the byte information is already lost. - Classic batch: Use the new `InputSourceReader.read(inputStats)` in `AbstractBatchIndexTask` - Streaming: Increment `processedBytes` in `StreamChunkParser`. This does not use the new `InputSourceReader.read(inputStats)` method. - Extend `InputStats` with `RowIngestionMeters` so that bytes can be exposed in task reports Other changes: - Update tests to verify the value of `processedBytes` - Rename `MutableRowIngestionMeters` to `SimpleRowIngestionMeters` and remove duplicate class - Replace `CacheTestSegmentCacheManager` with `NoopSegmentCacheManager` - Refactor `KafkaIndexTaskTest` and `KinesisIndexTaskTest`	2022-12-13 18:54:42 +05:30
somu-imply	7682b0b6b1	Analysis refactor (#13501 ) Refactor DataSource to have a getAnalysis method() This removes various parts of the code where while loops and instanceof checks were being used to walk through the structure of DataSource objects in order to build a DataSourceAnalysis. Instead we just ask the DataSource for its analysis and allow the stack to rebuild whatever structure existed.	2022-12-12 17:35:44 -08:00
Gian Merlino	de5a4bafcb	Zero-copy local deep storage. (#13394 ) * Zero-copy local deep storage. This is useful for local deep storage, since it reduces disk usage and makes Historicals able to load segments instantaneously. Two changes: 1) Introduce "druid.storage.zip" parameter for local storage, which defaults to false. This changes default behavior from writing an index.zip to writing a regular directory. This is safe to do even during a rolling update, because the older code actually already handled unzipped directories being present on local deep storage. 2) In LocalDataSegmentPuller and LocalDataSegmentPusher, use hard links instead of copies when possible. (Generally this is possible when the source and destination directory are on the same filesystem.)	2022-12-12 17:28:24 -08:00
Rishabh Singh	8e386072e9	Druid automated quickstart: zookeeper in service list (#13550 )	2022-12-12 10:29:43 -08:00
Karan Kumar	5a3d79a5d5	Removing unused exec service. (#13541 )	2022-12-12 14:39:42 +05:30
Clint Wylie	7002ecd303	add protobuf flattener, direct to plain java conversion for faster flattening (#13519 ) * add protobuf flattener, direct to plain java conversion for faster flattening, nested column tests	2022-12-09 12:24:21 -08:00
Rishabh Singh	4ebdfe226d	Druid automated quickstart (#13365 ) * Druid automated quickstart * remove conf/druid/single-server/quickstart/_common/historical/jvm.config * Minor changes in python script * Add lower bound memory for some services * Additional runtime properties for services * Update supervise script to accept command arguments, corresponding changes in druid-quickstart.py * File end newline * Limit the ability to start multiple instances of a service, documentation changes * simplify script arguments * restore changes in medium profile * run-druid refactor * compute and pass middle manager runtime properties to run-druid supervise script changes to process java opts array use argparse, leave free memory, logging * Remove extra quotes from mm task javaopts array * Update logic to compute minimum memory * simplify run-druid * remove debug options from run-druid * resolve the config_path provided * comment out service specific runtime properties which are computed in the code * simplify run-druid * clean up docs, naming changes * Throw ValueError exception on illegal state * update docs * rename args, compute_only -> compute, run_zk -> zk * update help documentation * update help documentation * move task memory computation into separate method * Add validation checks * remove print * Add validations * remove start-druid bash script, rename start-druid-main * Include tasks in lower bound memory calculation * Fix test * 256m instead of 256g * caffeine cache uses 5% of heap * ensure min task count is 2, task count is monotonic * update configs and documentation for runtime props in conf/druid/single-server/quickstart * Update docs * Specify memory argument for each profile in single-server.md * Update middleManager runtime.properties * Move quickstart configs to conf/druid/base, add bash launch script, support python2 * Update supervise script * rename base config directory to auto * rename python script, changes to pass repeated args to supervise * remove exmaples/conf/druid/base dir * add docs * restore changes in conf dir * update start-druid-auto * remove hashref for commands in supervise script * start-druid-main java_opts array is comma separated * update entry point script name in python script * Update help docs * documentation changes * docs changes * update docs * add support for running indexer * update supported services list * update help * Update python.md * remove dir * update .spelling * Remove dependency on psutil and pathlib * update docs * Update get_physical_memory method * Update help docs * update docs * update method to get physical memory on python * udpate spelling * update .spelling * minor change * Minor change * memory comptuation for indexer * update start-druid * Update python.md * Update single-server.md * Update python.md * run python3 --version to check if python is installed * Update supervise script * start-druid: echo message if python not found * update anchor text * minor change * Update condition in supervise script * JVM not jvm in docs	2022-12-09 11:04:02 -08:00
Gian Merlino	55814888f5	MSQ: Only look at sqlInsertSegmentGranularity on the outer query. (#13537 ) The planner sets sqlInsertSegmentGranularity in its context when using PARTITIONED BY, which sets it on every native query in the stack (as all native queries for a SQL query typically have the same context). QueryKit would interpret that as a request to configure bucketing for all native queries. This isn't useful, as bucketing is only used for the penultimate stage in INSERT / REPLACE. So, this patch modifies QueryKit to only look at sqlInsertSegmentGranularity on the outermost query. As an additional change, this patch switches the static ObjectMapper to use the processwide ObjectMapper for deserializing Granularities. Saves an ObjectMapper instance, and ensures that if there are any special serdes registered for Granularity, we'll pick them up.	2022-12-09 20:48:16 +05:30
Paul Rogers	013a12e86f	Enhanced MSQ table functions (#13360 ) * Enhanced MSQ table functions * HTTP, LOCALFILES and INLINE table functions powered by catalog metadata. * Documentation	2022-12-08 13:56:02 -08:00
Vadim Ogievetsky	d8e27eaab4	update error anchors (#13527 )	2022-12-08 13:18:35 -08:00
Gian Merlino	91ef9872ec	MSQ: Improve TooManyBuckets error message, improve error docs. (#13525 ) 1) Edited the TooManyBuckets error message to mention PARTITIONED BY instead of segmentGranularity. 2) Added error-code-specific anchors in the docs. 3) Add information to various error codes in the docs about common causes and solutions.	2022-12-08 13:18:26 -08:00
Vadim Ogievetsky	d85fb8cc4e	Web console: improve compaction status display (#13523 ) * improve compaction status display * even more accurate * fix snapshot	2022-12-07 21:03:59 -08:00
Adarsh Sanjeev	fbf76ad8f5	Remove stray reference to fix OOM while merging sketches (#13475 ) * Remove stray reference to fix OOM while merging sketches * Update future to add result from executor service * Update tests and address review comments * Address review comments * Moved mock * Close threadpool on teardown * Remove worker task cancel	2022-12-08 07:17:55 +05:30
Kashif Faraz	69951273b8	Fix typo in metric name (#13521 )	2022-12-08 06:41:23 +05:30
Jill Osborne	b56855b837	Update to native ingestion doc (#13482 ) * Update to native ingestion doc * Update docs/ingestion/native-batch.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update native-batch.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2022-12-07 15:08:19 +05:30
Vadim Ogievetsky	9679f6a9b5	Web console: add arrayOfDoublesSketch and other small fixes (#13486 ) * add padding and keywords * add arrayOfDoubles * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * partiton int * fix docs Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-12-06 21:21:49 -08:00
Kashif Faraz	c7229fc787	Limit max batch size for segment allocation, add docs (#13503 ) Changes: - Limit max batch size in `SegmentAllocationQueue` to 500 - Rename `batchAllocationMaxWaitTime` to `batchAllocationWaitTime` since the actual wait time may exceed this configured value. - Replace usage of `SegmentInsertAction` in `TaskToolbox` with `SegmentTransactionalInsertAction`	2022-12-07 10:07:14 +05:30
Abhishek Agarwal	b25cf216d5	Better error message when theta_sketch_intersect is used on scalar expression (#13508 )	2022-12-07 09:35:43 +05:30
Clint Wylie	37d8833125	fix bug with broker parallel merge metrics emitting, add wall time, fast/slow partition time metrics (#13420 )	2022-12-06 17:50:59 -08:00
imply-cheddar	83261f9641	Starting on Window Functions (#13458 ) * Processors for Window Processing This is an initial take on how to use Processors for Window Processing. A Processor is an interface that transforms RowsAndColumns objects. RowsAndColumns objects are essentially combinations of rows and columns. The intention is that these Processors are the start of a set of operators that more closely resemble what DB engineers would be accustomed to seeing. * Wire up windowed processors with a query type that can run them end-to-end. This code can be used to actually run a query, so yay! * Wire up windowed processors with a query type that can run them end-to-end. This code can be used to actually run a query, so yay! * Some SQL tests for window functions. Added wikipedia data to the indexes available to the SQL queries and tests validating the windowing functionality as it exists now. Co-authored-by: Gian Merlino <gianmerlino@gmail.com>	2022-12-06 15:54:05 -08:00
Clint Wylie	cf472162a6	fix issue with jetty graceful shutdown of data servers when druid.serverview.type=http (#13499 ) * fix issue with http server inventory view blocking data node http server shutdown with long polling * adjust * fix test inspections	2022-12-06 15:52:44 -08:00
Tejaswini Bandlamudi	136322d13b	clean install before license checks (#13502 )	2022-12-05 22:38:03 -08:00
Gian Merlino	fda0a1aadd	Set chatAsync default to true. (#13491 ) This functionality was originally added in #13354.	2022-12-05 20:53:59 -08:00
AmatyaAvadhanula	658a9c2d35	Early stop on failed start (Alternative to #13087 ) (#13258 ) * Make halt configurable. Don't halt in tests	2022-12-05 21:05:07 +05:30
Kashif Faraz	65945a686f	Docs: Update docs for coordinator dynamic config (#13494 ) * Update docs for useBatchedSegmentSampler * Update docs for round robin assigment	2022-12-05 16:53:10 +05:30
TSFenwick	10bec54acc	Switching emitter. This will allow for a per feed emitter designation. (#13363 ) * Switching emitter. This will allow for a per feed emitter designation. This will work by looking at an event's feed and direct it to a specific emitter. If no specific feed is specified for a feed. The emitter can direct the event to a default emitter. * fix checkstyle issues and make docs for switching emitter use basic event feeds * fix broken docs, add test, and guard against misconfigurations * add module test add switching emitter module test * fix broken SwitchingEmitterModuleTest * add apache license to top of test * fix checkstyle issues * address comments by adding javadocs, removing a todo, and making druid docs more clear	2022-12-05 16:04:34 +05:30
Kashif Faraz	45a8fa280c	Add SegmentAllocationQueue to batch SegmentAllocateActions (#13369 ) In a cluster with a large number of streaming tasks (~1000), SegmentAllocateActions on the overlord can often take very long intervals of time to finish thus causing spikes in the `task/action/run/time`. This may result in lag building up while a task waits for a segment to get allocated. The root causes are: - large number of metadata calls made to the segments and pending segments tables - `giant` lock held in `TaskLockbox.tryLock()` to acquire task locks and allocate segments Since the contention typically arises when several tasks of the same datasource try to allocate segments for the same interval/granularity, the allocation run times can be improved by batching the requests together. Changes - Add flags - `druid.indexer.tasklock.batchSegmentAllocation` (default `false`) - `druid.indexer.tasklock.batchAllocationMaxWaitTime` (in millis) (default `1000`) - Add methods `canPerformAsync` and `performAsync` to `TaskAction` - Submit each allocate action to a `SegmentAllocationQueue`, and add to correct batch - Process batch after `batchAllocationMaxWaitTime` - Acquire `giant` lock just once per batch in `TaskLockbox` - Reduce metadata calls by batching statements together and updating query filters - Except for batching, retain the whole behaviour (order of steps, retries, etc.) - Respond to leadership changes and fail items in queue when not leader - Emit batch and request level metrics	2022-12-05 14:00:07 +05:30
somu-imply	9177419628	Unnest functionality for Druid (#13268 ) * Moving all unnest cursor code atop refactored code for unnest * Updating unnest cursor * Removing dedup and fixing up some null checks * AllowList changes * Fixing some NPEs * Using bitset for allowlist * Updating the initialization only when cursor is in non-done state * Updating code to skip rows not in allow list * Adding a flag for cases when first element is not in allowed list * Updating for a null in allowList * Splitting unnest cursor into 2 subclasses * Intercepting some apis with columnName for new unnested column * Adding test cases and renaming some stuff * checkstyle fixes * Moving to an interface for Unnest * handling null rows in a dimension * Updating cursors after comments part-1 * Addressing comments and adding some more tests * Reverting a change to ScanQueryRunner and improving a comment * removing an unused function * Updating cursors after comments part 2 * One last fix for review comments * Making some functions private, deleting some comments, adding a test for unnest of unnest with allowList * Adding an exception for a case * Closure for unnest data source * Adding some javadocs * One minor change in makeDimSelector of columnarCursor * Updating an error message * Update processing/src/main/java/org/apache/druid/segment/DimensionUnnestCursor.java Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * Unnesting on virtual columns was missing an object array, adding that to support virtual columns unnesting * Updating exceptions to use UOE * Renamed files, added column capability test on adapter, return statement and made unnest datasource not cacheable for the time being * Handling for null values in dim selector * Fixing a NPE for null row * Updating capabilities * Updating capabilities Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2022-12-02 18:48:25 -08:00
Katya Macedo	78c1a2bd66	Remove limit from timeseries (#13457 ) CI build failures seem unrelated to docs	2022-12-02 12:19:59 -08:00
Paul Rogers	b76ff16d00	SQL test framework extensions (#13426 ) SQL test framework extensions * Capture planner artifacts: logical plan, etc. * Planner test builder validates the logical plan * Validation for the SQL resut schema (we already have validation for the Druid row signature) * Better Guice integration: properties, reuse Guice modules * Avoid need for hand-coded expr, macro tables * Retire some of the test-specific query component creation * Fix query log hook race condition	2022-12-02 09:11:59 -08:00
Tejaswini Bandlamudi	30498c1f98	Update gha & travis checks (#13412 ) * update static-checks GHA to run sequentially remove static-checks from travis.yml move docs, web-console, packaging checks from travis to GHA * nit * nit * groups all checks, runs on 8, 11, 17 jdks * nit * adds license info * update permissions on scripts folder * nit * nit * fix packaging check * changes naming, cleans repo before license checks * simulate failure * bump up license checks * test license checks failure * test license checks failure * test license checks failure * verify gha script run exit code * fail fast in case of shell script * verified fail fast in case of shell script	2022-12-02 15:06:31 +05:30
Jill Osborne	138a6de507	Update nested columns docs (#13461 ) * Update nested columns docs (cherry picked from commit `04206c5179`) * Update nested-columns.md (cherry picked from commit `8085ee7217`)	2022-12-01 10:47:32 -08:00
AmatyaAvadhanula	cc307e4c29	Fix needless task shutdown on leader switch (#13411 ) * Fix needless task shutdown on leader switch * Add unit test * Fix style * Fix UTs	2022-12-01 18:31:08 +05:30
abhagraw	f6f625ee08	MSQ Reindex IT (#13433 ) * MSQ Reindex IT * Fixing checkstyle errors * Addressing comments * Addressing comments	2022-12-01 12:13:23 +05:30
Adarsh Sanjeev	8395273099	Add unit tests for MSQ ingestion faults (#13439 ) * Add unit tests for MSQ ingestion faults * Resolve build failure * Move test to MSQFaultTest * Rename test	2022-12-01 10:11:49 +05:30
Adarsh Sanjeev	2f3b97194f	Fix harcoded version in pom file (#13460 )	2022-12-01 10:10:04 +05:30
Vadim Ogievetsky	2fdcfffe40	don't render duration if aggregated (#13455 )	2022-11-30 19:21:07 -08:00
317brian	cc2e4a80ff	doc: add a basic JDBC tutorial (#13343 ) * initial commit for jdbc tutorial (cherry picked from commit 04c4adad71e5436b76c3425fe369df03aaaf0acb) * add commentary * address comments from charles * add query context to example * fix typo * add links * Apply suggestions from code review Co-authored-by: Frank Chen <frankchen@apache.org> * fix datatype * address feedback * add parameterize to spelling file. the past tense version was already there Co-authored-by: Frank Chen <frankchen@apache.org>	2022-11-30 16:25:35 -08:00
xiaokang	6ba35f6d59	update org.bouncycastle:bcprov-jdk15on 1.68 to 1.69 (#13440 )	2022-11-30 21:57:38 +05:30
Adarsh Sanjeev	af164cbc10	Fix an issue with WorkerSketchFetcher not terminating on shutdown (#13459 ) * Fix an issue with WorkerSketchFetcher not terminating on shutdown * Change threadpool name	2022-11-30 21:02:48 +05:30
Kashif Faraz	8ff1b2d5d4	Revert "Add filter in cloud object input source for backward compatibility (#13437 )" (#13450 ) This reverts commit `b12e5f300e`.	2022-11-30 16:33:05 +05:30
Jill Osborne	291ded22d5	Update experimental features doc (#13452 )	2022-11-30 16:14:43 +05:30
Gian Merlino	50963edcae	Fix compile error in MSQSelectTest. (#13456 )	2022-11-29 15:51:03 -08:00
Jill Osborne	5c520e0cf9	Update LDAP configuration docs (#13245 ) * Update LDAP configuration docs * Updated after review * Update auth-ldap.md Updated. * Update auth-ldap.md * Updated spelling file * Update docs/operations/auth-ldap.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/operations/auth-ldap.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/operations/auth-ldap.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update auth-ldap.md Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-11-29 09:26:32 -08:00
Laksh Singla	79df11c16c	Improve unit test coverage for MSQ (#13398 ) * add faults tests for the multi stage query * add too many parttiions fault * add toomanyinputfilesfault * programmatically generate the file * refactor * Trigger Build	2022-11-29 17:27:04 +05:30
Laksh Singla	4ed6255bdf	Convert errors based on implicit type conversion in multi value arrays to parse exception in MSQ (#13366 ) * initial commit * fix test * push the json changes * reduce the area of the try..catch * Trigger Build * review	2022-11-29 17:19:57 +05:30
Karan Kumar	edd076ca69	Remove duplicate FrameRowTooLargeException.java (#13441 ) * Removing duplicate FrameRowTooLargeException.java * Fixing intellij inspection	2022-11-29 08:46:59 +05:30

... 3 4 5 6 7 ...

12496 Commits All Branches Search

12496 Commits

All Branches