druid

Commit Graph

Author	SHA1	Message	Date
Jill Osborne	b4d83a86c2	Middle Manager wording update in docs (#17005 )	2024-09-05 10:25:30 -07:00
Charles Smith	e562dd3ac6	Docs: note on iceberg (#16955 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-08-27 14:27:23 -07:00
AmatyaAvadhanula	8c8a4b2302	Remove references to chatAsync (#16950 ) Remove references to chatAsync from Rabbit stream supervisors	2024-08-23 13:21:07 +05:30
Clint Wylie	a34a06e192	remove Firehose and FirehoseFactory (#16758 ) changes: * removed `Firehose` and `FirehoseFactory` and remaining implementations which were mostly no longer used after #16602 * Moved `IngestSegmentFirehose` which was still used internally by Hadoop ingestion to `DatasourceRecordReader.SegmentReader` * Rename `SQLFirehoseFactoryDatabaseConnector` to `SQLInputSourceDatabaseConnector` and similar renames for sub-classes * Moved anything remaining in a 'firehose' package somewhere else * Clean up docs on firehose stuff	2024-07-19 14:37:21 -07:00
YongGang	4b293fc2a9	Docs: Fix k8s dynamic config URL (#16720 )	2024-07-11 10:05:47 +05:30
Suneet Saldanha	4e0ea7823b	Update docs for K8s TaskRunner Dynamic Config (#16600 ) * Update docs for K8s TaskRunner Dynamic Config * touchups * code review * npe * oopsies	2024-06-21 06:01:59 -07:00
YongGang	46dbc74053	Support Dynamic Peon Pod Template Selection in K8s extension (#16510 ) * initial commit * add Javadocs * refine JSON input config * more test and fix build * extract existing behavior as default strategy * change template mapping fallback * add docs * update doc * fix doc * address comments * define Matcher interface * fix test coverage * use lower case for endpoint path * update Json name * add more tests * refactoring Selector class	2024-06-12 15:27:10 -07:00
Vadim Ogievetsky	a124c6cbbd	fix typo in extension name (#16466 )	2024-05-20 09:47:22 +08:00
Abhishek Radhakrishnan	1d7595f3f7	Support for filters in the Druid Delta Lake connector (#16288 ) * Delta Lake support for filters. * Updates * cleanup comments * Docs * Remmove Enclosed runner * Rename * Cleanup test * Serde test for the Delta input source and fix jackson annotation. * Updates and docs. * Update error messages to be clearer * Fixes * Handle NumberFormatException to provide a nicer error message. * Apply suggestions from code review Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Doc fixes based on feedback * Yes -> yes in docs; reword slightly. * Update docs/ingestion/input-sources.md Co-authored-by: Laksh Singla <lakshsingla@gmail.com> * Update docs/ingestion/input-sources.md Co-authored-by: Laksh Singla <lakshsingla@gmail.com> * Documentation, javadoc and more updates. * Not with an or expression end-to-end test. * Break up =, >, >=, <, <= into its own types instead of sub-classing. --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: Laksh Singla <lakshsingla@gmail.com>	2024-04-29 11:31:36 -07:00
Atul Mohan	77333e56fa	Docs: Add missing kafka emitter config (#16332 )	2024-04-25 10:37:14 +05:30
Hardik Bajaj	0bf5e7745d	Add configurable parameters for statsd client (#16283 ) Statsd client sometimes drops metrics when this queueSize of statsd client with max unprocessed messages is completely full. This causes some high cardinality metrics like per partition lag being droppped. There are multiple parameters of statsdclient that can be initialized and can help increase the load/capacity of client to not to drop metrics more frequently. Properties like queueSize, poolSize, processorWorkers and senderWorkers will now be configurable at runtime	2024-04-17 18:35:31 +05:30
Jill Osborne	67ae0ff450	Update docs for rabbit community extension (#16069 ) * Updated docs for rabbit community extension * Updated after review	2024-03-07 11:29:53 -08:00
Sensor	4e9b758661	Support CPU resource configurable for Kubernates job under MoK Mode (#16008 ) * support CPU resource configurable for Kubernates job * update property doc * fix test name * refine doc format	2024-03-04 10:12:09 -05:00
George Shiqi Wu	59bb72a926	Fix parsing of env variables when properties have underscores (#15919 ) * Fix parsing of env variables when properties have underscores * Add documentation * Use a % sign instead	2024-02-21 13:18:21 -05:00
Tom	11a8624ef1	allow for kafka-emitter to have extra dimensions be set for each event it emits (#15845 ) * allow for kafka-emitter to have extra dimensions be set for each event it emits * fix checktsyle issue in kafkaemitterconfig * make changes to fix docs, and cleanup copy paste error in #toString() * undo formatting to markdown table * add more branches so test passes * fix checkstyle issue	2024-02-08 22:55:24 -08:00
Abhishek Radhakrishnan	1a5b57df84	Update `groupId` for delta-lake and iceberg extensions (#15843 ) * Update the group id to org.apache.druid.extensions.contrib for contrib exts. * Note iceberg and delta lake extensions in extensions.md * properties and shell backticks * Update groupId in distribution/pom.xml * remove delta-lake from dist. * Add note on downloading extension.	2024-02-07 23:54:06 -08:00
Abhishek Radhakrishnan	9f95a691f7	Extension to read and ingest Delta Lake tables (#15755 ) * something * test commit * compilation fix * more compilation fixes (fixme placeholders) * Comment out druid-kereberos build since it conflicts with newly added transitive deps from delta-lake Will need to sort out the dependencies later. * checkpoint * remove snapshot schema since we can get schema from the row * iterator bug fix * json json json * sampler flow * empty impls for read(InputStats) and sample() * conversion? * conversion, without timestamp * Web console changes to show Delta Lake * Asset bug fix and tile load * Add missing pieces to input source info, etc. * fix stuff * Use a different delta lake asset * Delta lake extension dependencies * Cleanup * Add InputSource, module init and helper code to process delta files. * Test init * Checkpoint changes * Test resources and updates * some fixes * move to the correct package * More tests * Test cleanup * TODOs * Test updates * requirements and javadocs * Adjust dependencies * Update readme * Bump up version * fixup typo in deps * forbidden api and checkstyle checks * Trim down dependencies * new lines * Fixup Intellij inspections. * Add equals() and hashCode() * chain splits, intellij inspections * review comments and todo placeholder * fix up some docs * null table path and test dependencies. Fixup broken link. * run prettify * Different test; fixes * Upgrade pyspark and delta-spark to latest (3.5.0 and 3.0.0) and regenerate tests * yank the old test resource. * add a couple of sad path tests * Updates to readme based on latest. * Version support * Extract Delta DateTime converstions to DeltaTimeUtils class and add test * More comprehensive split tests. * Some test renames. * Cleanup and update instructions. * add pruneSchema() optimization for table scans. * Oops, missed the parquet files. * Update default table and rename schema constants. * Test setup and misc changes. * Add class loader logic as the context class loader is unaware about extension classes * change some table client creation logic. * Add hadoop-aws, hadoop-common and related exclusions. * Remove org.apache.hadoop:hadoop-common * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Add entry to .spelling to fix docs static check --------- Co-authored-by: abhishekagarwal87 <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: Laksh Singla <lakshsingla@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-01-30 21:53:50 -08:00
Hiroshi Fukada	3fe3a65344	New: Add DDSketch in extensions-contrib (#15049 ) * New: Add DDSketch-Druid extension - Based off of http://www.vldb.org/pvldb/vol12/p2195-masson.pdf and uses the corresponding https://github.com/DataDog/sketches-java library - contains tests for post building and using aggregation/post aggregation. - New aggregator: `ddSketch` - New post aggregators: `quantileFromDDSketch` and `quantilesFromDDSketch` * Fixing easy CodeQL warnings/errors * Fixing docs, and dependencies Also moved aggregator ids to AggregatorUtil and PostAggregatorIds * Adding more Docs and better null/empty handling for aggregators * Fixing docs, and pom version * DDSketch documentation format and wording	2024-01-23 20:17:07 +05:30
Ben Sykes	e49a7bb3cd	Add SpectatorHistogram extension (#15340 ) * Add SpectatorHistogram extension * Clarify documentation Cleanup comments * Use ColumnValueSelector directly so that we support being queried as a Number using longSum or doubleSum aggregators as well as a histogram. When queried as a Number, we're returning the count of entries in the histogram. * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Fix references * Fix spelling * Update docs/development/extensions-contrib/spectator-histogram.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> --------- Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-01-14 09:52:30 -08:00
Abhishek Radhakrishnan	f0f428274a	Prometheus config property doc fixup (#15613 ) * Minor fixes * Update docs/development/extensions-contrib/prometheus.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2024-01-02 16:28:42 -08:00
Bartosz Mikulski	4670a7650f	Optional removal of metrics from Prometheus PushGateway on shutdown (#14935 ) * Optional removal of metrics from Prometheus PushGateway on shutdown * Make pushGatewayDeleteOnShutdown property nullable * Add waitForShutdownDelay property * Fix unit test * Address PR comments * Address PR comments * Add explanation on why it is useful to have deletePushGatewayMetricsOnShutdown * Fix spelling error * Fix spelling error	2023-12-13 11:58:53 -05:00
YongGang	3a3d37ef40	Fix for segment/count Metric Not Emitting with Statsd-emitter (#15347 ) * fix segment/count metric in Statsd-emitter * update doc * Update docs/development/extensions-contrib/prometheus.md Co-authored-by: Suneet Saldanha <suneet@apache.org> * Update docs/development/extensions-contrib/statsd.md Co-authored-by: Suneet Saldanha <suneet@apache.org> --------- Co-authored-by: Suneet Saldanha <suneet@apache.org>	2023-11-10 08:08:58 -08:00
YongGang	7a25ee4fd9	Ability to send task types to k8s or worker task runner (#15196 ) * Ability to send task types to k8s or worker task runner * add more tests * use runnerStrategy to determine task runner * minor refine * refine runner strategy config * move workerType config to upper level * validate config when application start	2023-10-25 09:55:56 -07:00
George Shiqi Wu	f773d83914	Mixed task runner for migration to mm-less ingestion (#14918 ) * save work * Working * Fix runner constructor * Working runner * extra log lines * try using lifecycle for everything * clean up configs * cleanup /workers call * Use a single config * Allow selecting runner * debug changes * Work on composite task runner * Unit tests running * Add documentation * Add some javadocs * Fix spelling * Use standard libraries * code review * fix * fix * use taskRunner as string * checkstyl --------- Co-authored-by: Suneet Saldanha <suneet@apache.org>	2023-09-11 18:09:46 -07:00
John Gerassimou	d201ea0ece	prometheus-emitter: add extraLabels parameter (#14728 ) * prometheus-emitter: add extraLabels parameter * prometheus-emitter: update readme to include the extraLabels parameter * prometheus-emitter: remove nullable and surface label name issues * remove import to make linter happy	2023-08-29 12:02:22 -07:00
YongGang	3954685aae	Report more metrics to monitor K8s task runner (#14771 ) * Report pod running metrics to monitor K8s task runner * refine method definition * fix checkstyle * implement task metrics * more comment * address comments * update doc for the new metrics reported * fix checkstyle * refine method definition * minor refine	2023-08-16 14:03:53 -04:00
George Shiqi Wu	174053f4fd	Add readme for kubernetes-overlord-extensions and update docs (#14674 ) * Add readme for kubernetes task scheduler * clean up uneeded stuff * Update extensions-contrib/kubernetes-overlord-extensions/README.md Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * Move documentation into main page * indentation * cleanup spellcheck errors * Update docs/development/extensions-contrib/k8s-jobs.md Co-authored-by: Suneet Saldanha <suneet@apache.org> * Update extensions-contrib/kubernetes-overlord-extensions/README.md Co-authored-by: Suneet Saldanha <suneet@apache.org> * Update docs/development/extensions-contrib/k8s-jobs.md Co-authored-by: Suneet Saldanha <suneet@apache.org> * PR comments * Update docs/development/extensions-contrib/k8s-jobs.md Co-authored-by: Suneet Saldanha <suneet@apache.org> * Update docs/development/extensions-contrib/k8s-jobs.md Co-authored-by: Suneet Saldanha <suneet@apache.org> * Update docs/development/extensions-contrib/k8s-jobs.md Co-authored-by: Suneet Saldanha <suneet@apache.org> --------- Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: Suneet Saldanha <suneet@apache.org>	2023-08-01 13:29:44 -07:00
Atul Mohan	03d6d395a0	Extension to read and ingest iceberg data files (#14329 ) This adds a new contrib extension: druid-iceberg-extensions which can be used to ingest data stored in Apache Iceberg format. It adds a new input source of type iceberg that connects to a catalog and retrieves the data files associated with an iceberg table and provides these data file paths to either an S3 or HDFS input source depending on the warehouse location. Two important dependencies associated with Apache Iceberg tables are: Catalog : This extension supports reading from either a Hive Metastore catalog or a Local file-based catalog. Support for AWS Glue is not available yet. Warehouse : This extension supports reading data files from either HDFS or S3. Adapters for other cloud object locations should be easy to add by extending the AbstractInputSourceAdapter.	2023-07-18 08:59:57 +05:30
Jakub Matyszewski	cc159f4317	docs: k8s-jobs role needs batch apigroup (#14343 )	2023-07-04 14:34:20 +05:30
Harini Rajendran	4ff6026d30	Adding SegmentMetadataEvent and publishing them via KafkaEmitter (#14281 ) In this PR, we are enhancing KafkaEmitter, to emit metadata about published segments (SegmentMetadataEvent) into a Kafka topic. This segment metadata information that gets published into Kafka, can be used by any other downstream services to query Druid intelligently based on the segments published. The segment metadata gets published into kafka topic in json string format similar to other events.	2023-06-02 21:28:26 +05:30
Katya Macedo	269137c682	Update Ingestion section (#14023 ) Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Victoria Lim <lim.t.victoria@gmail.com>	2023-05-19 09:42:27 -07:00
317brian	6254658f61	docs: fix links (#14111 )	2023-05-12 09:59:16 -07:00
Nicholas Lippis	5810e650d4	K8s mm less fixes (#14028 ) Update Fabric8 version and allow metrics monitors to be overriden	2023-04-05 22:23:16 +05:30
Nicholas Lippis	8a72544bd2	Hook up pod template adapter (#13966 ) * Hook up PodTemplateTaskAdapter * Make task adapter TYPE parameters final * Rename adapters types * Include specified adapter name in exception message * Documentation for sidecarSupport deprecation * Fix order * Set TASK_ID as environment variable in PodTemplateTaskAdapter (#13969) * Update docs/development/extensions-contrib/k8s-jobs.md Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * Hook up PodTemplateTaskAdapter * Make task adapter TYPE parameters final * Rename adapters types * Include specified adapter name in exception message * Documentation for sidecarSupport deprecation * Fix order * fix spelling errors --------- Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2023-03-24 12:13:46 -06:00
Churro	c1f283fd31	Better sidecar support (#13655 ) * Better sidecar support * remove un-thrown exception from test * Druid you are such a stickler about spelling :) * Only require the primaryContainerName, no need to exclude containers	2023-02-14 10:56:15 +05:30
Dr. Sizzles	e5ad24ff9f	Support for middle manager less druid, tasks launch as k8s jobs (#13156 ) * Support for middle manager less druid, tasks launch as k8s jobs * Fixing forking task runner test * Test cleanup, dependency cleanup, intellij inspections cleanup * Changes per PR review Add configuration option to disable http/https proxy for the k8s client Update the docs to provide more detail about sidecar support * Removing un-needed log lines * Small changes per PR review * Upon task completion we callback to the overlord to update the status / locaiton, for slower k8s clusters, this reduces locking time significantly * Merge conflict fix * Fixing tests and docs * update tiny-cluster.yaml changed `enableTaskLevelLogPush` to `encapsulatedTask` * Apply suggestions from code review Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * Minor changes per PR request * Cleanup, adding test to AbstractTask * Add comment in peon.sh * Bumping code coverage * More tests to make code coverage happy * Doh a duplicate dependnecy * Integration test setup is weird for k8s, will do this in a different PR * Reverting back all integration test changes, will do in anotbher PR * use StringUtils.base64 instead of Base64 * Jdk is nasty, if i compress in jdk 11 in jdk 17 the decompressed result is different Co-authored-by: Rahul Gidwani <r_gidwani@apple.com> Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2022-11-02 19:44:47 -07:00
cristian-popa	cc10350870	Collocated processes instructions (#13224 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Frank Chen <frankchen@apache.org>	2022-10-17 11:56:00 -07:00
Gian Merlino	d4967c38f8	Various documentation updates. (#13107 ) * Various documentation updates. 1) Split out "data management" from "ingestion". Break it into thematic pages. 2) Move "SQL-based ingestion" into the Ingestion category. Adjust content so all conceptual content is in concepts.md and all syntax content is in reference.md. Shorten the known issues page to the most interesting ones. 3) Add SQL-based ingestion to the ingestion method comparison page. Remove the index task, since index_parallel is just as good when maxNumConcurrentSubTasks: 1. 4) Rename various mentions of "Druid console" to "web console". 5) Add additional information to ingestion/partitioning.md. 6) Remove a mention of Tranquility. 7) Remove a note about upgrading to Druid 0.10.1. 8) Remove no-longer-relevant task types from ingestion/tasks.md. 9) Move ingestion/native-batch-firehose.md to the hidden section. It was previously deprecated. 10) Move ingestion/native-batch-simple-task.md to the hidden section. It is still linked in some places, but it isn't very useful compared to index_parallel, so it shouldn't take up space in the sidebar. 11) Make all br tags self-closing. 12) Certain other cosmetic changes. 13) Update to node-sass 7. * make travis use node12 for docs Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com>	2022-09-16 21:58:11 -07:00
DENNIS	dced61645f	prometheus-emitter supports sending metrics to pushgateway regularly … (#13034 ) * prometheus-emitter supports sending metrics to pushgateway regularly and continuously * spell check fix * Optimization variable name and related documents * Update docs/development/extensions-contrib/prometheus.md OK, it looks more conspicuous Co-authored-by: Frank Chen <frankchen@apache.org> * Update doc * Update docs/development/extensions-contrib/prometheus.md Co-authored-by: Frank Chen <frankchen@apache.org> * When PrometheusEmitter is closed, close the scheduler * Ensure that registeredMetrics is thread safe. * Local variable name optimization * Remove unnecessary white space characters Co-authored-by: Frank Chen <frankchen@apache.org>	2022-09-09 20:46:14 +08:00
Frank Chen	d57557d51d	Improve doc and configuration of prometheus emitter (#13028 ) * Improve doc and validation * Add configuration for peon tasks * Update doc * Update test case * Fix typo * Update docs/development/extensions-contrib/prometheus.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/development/extensions-contrib/prometheus.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2022-09-09 02:20:34 +08:00
senthilkv	3d9aef225d	compressed big decimal - module (#10705 ) Compressed Big Decimal is an extension which provides support for Mutable big decimal value that can be used to accumulate values without losing precision or reallocating memory. This type helps in absolute precision arithmetic on large numbers in applications, where greater level of accuracy is required, such as financial applications, currency based transactions. This helps avoid rounding issues where in potentially large amount of money can be lost. Accumulation requires that the two numbers have the same scale, but does not require that they are of the same size. If the value being accumulated has a larger underlying array than this value (the result), then the higher order bits are dropped, similar to what happens when adding a long to an int and storing the result in an int. A compressed big decimal that holds its data with an embedded array. Compressed big decimal is an absolute number based complex type based on big decimal in Java. This supports all the functionalities supported by Java Big Decimal. Java Big Decimal is not mutable in order to avoid big garbage collection issues. Compressed big decimal is needed to mutate the value in the accumulator.	2022-09-06 00:06:57 -07:00
Jianhuan Liu	d4403c15aa	Upgrade prometheus version, add more labels to PrometheusEmitter (#12769 ) Changes: - Upgrade prometheus to version 0.16.0 - Add optional labels `druid_service` and `host_name` to `PrometheusEmitter`	2022-07-15 14:43:12 +05:30
Jihoon Son	fc9513b6cd	Make NodeRole available during binding; add support for dynamic registration of DruidService (#12012 ) * Make nodeRole available during binding; add support for dynamic registration of DruidService * fix checkstyle and test * fix customRole test * address comments * add more javadoc	2021-12-03 11:59:00 -08:00
Charles Smith	7ed46800c3	Docs: Add multi-dimension partitioning doc; refactor native batch and separate into smaller topics. (#11983 ) Adds documentation for multi-dimension partitioning. cc: @kfaraz Refactors the native batch partitioning topic as follows: Native batch ingestion covers parallel-index Native batch simple task indexing covers index Native batch input sources covers ioSource Native batch ingestion with firehose covers deprecated firehose	2021-12-03 16:37:14 +05:30
Clint Wylie	ec334a641b	MySQL extension with MariaDB connector docs (#11608 ) * add docs for mariadb support via mysql extensions * add logging so you know what druid knows * homogenize * spelling * missed a couple	2021-08-19 01:52:26 -07:00
frank chen	bf5d829b71	Add more guidelines on the use of aliyun-oss-extensions (#11420 ) * Add more description Signed-off-by: frank chen <frank.chen021@outlook.com> * Update prefixes usage and Add troubleshooting section * Add endpoint configuration recommendation * Fix link * resolve review comments	2021-08-09 17:27:35 -07:00
Yi Yuan	8de0d36c52	Allow query through router when load moving average extension (#11276 ) * init commit * change NoopQuerySegmentWalker name * change doc * move NoopQuerySegmentWalker and add document * fix doc Co-authored-by: yuanyi <yuanyi@freewheel.tv>	2021-06-10 18:46:53 +08:00
frank chen	e664bfd433	Improve doc of movingAverage (#11262 ) * Make doc more directive Signed-off-by: frank chen <frank.chen021@outlook.com> * Add limitation Signed-off-by: frank chen <frank.chen021@outlook.com> * Suppress spelling check error	2021-05-28 13:10:55 +08:00
Yuanli Han	34169c8550	fix doc (#11202 ) (cherry picked from commit ffb3c049726b5e461c6f7f8b6f4b75d2cb907dcc)	2021-05-05 06:17:07 -07:00
sthetland	fb6751fa45	Fix old broken link (#11048 ) * link check fixes * updated link target * Update aggregations.md * spelling error	2021-04-07 20:40:50 -07:00

1 2

70 Commits