druid

Commit Graph

Author	SHA1	Message	Date
Gian Merlino	d1877e41ec	Use lookup memory footprint in MSQ memory computations. (#13271 ) * Use lookup memory footprint in MSQ memory computations. Two main changes: 1) Add estimateHeapFootprint to LookupExtractor. 2) Use this in MSQ's IndexerWorkerContext when determining the total amount of available memory. It's taken off the top. This prevents MSQ tasks from running out of memory when there are lookups defined in the cluster. * Updates from code review.	2022-11-03 07:36:54 -07:00
317brian	ae638e338c	docs(msq): update insert vs replace for dimension-based segment pruning (#13228 ) * docs(msq): update insert vs replace to mention dimension-based segment pruning * make suggested changes	2022-11-03 14:17:44 +05:30
Dr. Sizzles	e5ad24ff9f	Support for middle manager less druid, tasks launch as k8s jobs (#13156 ) * Support for middle manager less druid, tasks launch as k8s jobs * Fixing forking task runner test * Test cleanup, dependency cleanup, intellij inspections cleanup * Changes per PR review Add configuration option to disable http/https proxy for the k8s client Update the docs to provide more detail about sidecar support * Removing un-needed log lines * Small changes per PR review * Upon task completion we callback to the overlord to update the status / locaiton, for slower k8s clusters, this reduces locking time significantly * Merge conflict fix * Fixing tests and docs * update tiny-cluster.yaml changed `enableTaskLevelLogPush` to `encapsulatedTask` * Apply suggestions from code review Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * Minor changes per PR request * Cleanup, adding test to AbstractTask * Add comment in peon.sh * Bumping code coverage * More tests to make code coverage happy * Doh a duplicate dependnecy * Integration test setup is weird for k8s, will do this in a different PR * Reverting back all integration test changes, will do in anotbher PR * use StringUtils.base64 instead of Base64 * Jdk is nasty, if i compress in jdk 11 in jdk 17 the decompressed result is different Co-authored-by: Rahul Gidwani <r_gidwani@apple.com> Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2022-11-02 19:44:47 -07:00
Jason Koch	0d03ce435f	introduce a "tree" type to the flattenSpec (#12177 ) * introduce a "tree" type to the flattenSpec * feedback - rename exprs to nodes, use CollectionsUtils.isNullOrEmpty for guard * feedback - expand docs to more clearly capture limitations of "tree" flattenSpec * feedback - fix for typo on docs * introduce a comment to explain defensive copy, tweak null handling * fix: part of rebase * mark ObjectFlatteners.FlattenerMaker as an ExtensionPoint and provide default for new tree type * fix: objectflattener restore previous behavior to call getRootField for root type * docs: ingestion/data-formats add note that ORC only supports path expressions * chore: linter remove unused import * fix: use correct newer form for empty DimensionsSpec in FlattenJSONBenchmark	2022-11-01 14:49:30 +08:00
Gian Merlino	d851985cf5	MSQ: Add support for indexSpec. (#13275 )	2022-10-28 14:27:50 -07:00
Adarsh Sanjeev	4775427e2c	Add task start status to worker report (#13263 ) * Add task start status to worker report * Address review comments * Address review comments * Update documentation * Update spelling checks	2022-10-28 12:00:15 +05:30
Tejaswini Bandlamudi	49e54a0ec6	Docs: Update inputSegmentSizeBytes description (#13266 )	2022-10-28 09:33:52 +05:30
Clint Wylie	77e4246598	add support for 'front coded' string dictionaries for smaller string columns (#12277 ) * add FrontCodedIndexed for delta string encoding * now for actual segments * fix indexOf * fixes and thread safety * add bucket size 4, which seems generally better * fixes * fixes maybe * update indexes to latest interfaces * utf8 support * adjust * oops * oops * refactor, better, faster * more test * fixes * revert * adjustments * fix prefixing * more chill * sql nested benchmark too * refactor * more comments and javadocs * better get * remove base class * fix * hot rod * adjust comments * faster still * minor adjustments * spatial index support * spotbugs * add isSorted to Indexed to strengthen indexOf contract if set, improve javadocs, add docs * fix docs * push into constructor * use base buffer instead of copy * oops	2022-10-25 18:05:38 -07:00
317brian	c83115e4e1	api: change API page formatting (#13213 ) Tracking additional improvements requested by @paul-rogers: #13239 * api: refactor page so that indented bullet is child and unindented portion is parent * get rid of post etc headings and combine them with the endpoint * Update docs/operations/api-reference.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * fix broken links * fix typo Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2022-10-18 13:22:26 -07:00
Paul Rogers	b34b4353f4	Async reads for JDBC (#13196 ) Async reads for JDBC: Prevents JDBC timeouts on long queries by returning empty batches when a batch fetch takes too long. Uses an async model to run the result fetch concurrently with JDBC requests. Fixed race condition in Druid's Avatica server-side handler Fixed issue with no-user connections	2022-10-18 11:40:57 -07:00
cristian-popa	cc10350870	Collocated processes instructions (#13224 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Frank Chen <frankchen@apache.org>	2022-10-17 11:56:00 -07:00
Gian Merlino	3bbb76f17b	Docs: Add query/cpu/time to real-time metrics. (#13229 )	2022-10-15 18:26:44 +05:30
arvindanugula	42384d85e7	Update nested-columns.md (#13227 ) typo error corrected.	2022-10-14 16:15:46 -07:00
Victoria Lim	02ad62a08c	Docs: update description of query priority default value (#13191 ) * update description of default for query priority * update order * update terms * standardize to query context parameters	2022-10-14 14:28:04 -07:00
Karan Kumar	9d51e466b1	Minor doc update for BroadcastTablesTooLarge (#13218 ) Minor doc update for `BroadcastTablesTooLarge`. Now the user will know what to do in case this fault is encountered.	2022-10-14 09:06:55 +05:30
Tejaswini Bandlamudi	3e13584e0e	Adds Idle feature to `SeekableStreamSupervisor` for inactive stream (#13144 ) * Idle Seekable stream supervisor changes. * nit * nit * nit * Adds unit tests * Supervisor decides it's idle state instead of AutoScaler * docs update * nit * nit * docs update * Adds Kafka unit test * Adds Kafka Integration test. * Updates travis config. * Updates kafka-indexing-service dependencies. * updates previous offsets snapshot & doc * Doesn't act if supervisor is suspended. * Fixes highest current offsets fetch bug, adds new Kafka UT tests, doc changes. * Reverts Kinesis Supervisor idle behaviour changes. * nit * nit * Corrects SeekableStreamSupervisorSpec check on idle behaviour config, adds tests. * Fixes getHighestCurrentOffsets to fetch offsets of publishing tasks too * Adds Kafka Supervisor UT * Improves test coverage in druid-server * Corrects IT override config * Doc updates and Syntactic changes * nit * supervisorSpec.ioConfig.idleConfig changes	2022-10-12 18:31:08 +05:30
Jonathan Wei	9b8e69c99a	Add inline descriptor Protobuf bytes decoder (#13192 ) * Add inline descriptor Protobuf bytes decoder * PR comments * Update tests, check for IllegalArgumentException * Fix license, add equals test * Update extensions-core/protobuf-extensions/src/main/java/org/apache/druid/data/input/protobuf/InlineDescriptorProtobufBytesDecoder.java Co-authored-by: Frank Chen <frankchen@apache.org> Co-authored-by: Frank Chen <frankchen@apache.org>	2022-10-11 13:37:28 -05:00
Charles Smith	25c1d55dd6	Clarify behavior when decommissioningMaxPercentOfMaxSegmentsToMove = 0 (#13157 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-10-07 09:01:32 -07:00
317brian	0edceead80	msq: update known issue about GROUPING SETS and COUNT DISTINCT (#13185 ) * msq: update known issue about GROUPING SETS and COUNT DISTINCT * address feedback from Gian	2022-10-05 19:47:03 -07:00
AmatyaAvadhanula	41e51b21c3	Make http options the default configurations (#13092 ) Druid currently uses Zookeeper dependent options as the default. This commit updates the following to use HTTP as the default instead. - task runner. `druid.indexer.runner.type=remote -> httpRemote` - load queue peon. `druid.coordinator.loadqueuepeon.type=curator -> http` - server inventory view. `druid.serverview.type=curator -> http`	2022-10-05 05:35:17 +05:30
Adarsh Sanjeev	92d2633ae6	Update ClusterByStatisticsCollectorImpl to use bytes instead of keys (#12998 ) * Update clusterByStatistics to use bytes instead of keys * Address review comments * Resolve checkstyle * Increase test coverage * Update test * Update thresholds * Update retained keys function * Update docs * Fix spelling	2022-10-03 12:08:23 +05:30
Jill Osborne	548d810baa	Correct nested columns example (#13150 )	2022-09-28 10:39:56 +05:30
David Palmer	0d7bf66578	Add a note to the documentation about pre-built HLLSketches (#13088 ) * add a note to the documentation about pre-built HLLSketches Druid actually supports ingesting a pre-generated sketch column by using the HLLSketchMerge aggregator. However, this functionality was previously not made clear in the documentation. * copyedit from the King's English to American English * add suggested style changes Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-09-27 10:29:39 +08:00
Apoorv Gupta	c8f4d72fb1	Fix documentation bug about injective lookups (#13147 ) replace mapping to `unique keys` with mapping to `unique values`.	2022-09-27 10:16:48 +08:00
Jonathan Wei	1f1fced6d4	Add JsonInputFormat option to assume newline delimited JSON, improve parse exception handling for multiline JSON (#13089 ) * Add JsonInputFormat option to assume newline delimited JSON, improve handling for non-NDJSON * Fix serde and docs * Add PR comment check	2022-09-26 19:51:04 -05:00
Charles Smith	eb760c3d1d	update log4j example (#13095 ) * update log4j example * fix some style issues * Update docs/configuration/logging.md Co-authored-by: Frank Chen <frankchen@apache.org> Co-authored-by: Frank Chen <frankchen@apache.org>	2022-09-22 09:46:49 +08:00
317brian	12f12a13a9	fix: fix broken postgres link (#13135 )	2022-09-22 09:46:20 +08:00
317brian	7fa35839c0	fix: follow naming convention for msq task engine (#13127 ) * fix: follow naming convention for msq task engine * more fixes * add back in experimental * fix anchor	2022-09-21 18:46:06 -07:00
Gian Merlino	2f731f356e	Update pull-deps docs with correct repo list. (#13134 ) There is only one default remote repo at this time.	2022-09-21 12:16:57 -07:00
Katya Macedo	90d14f629a	spatial-filters (#13124 )	2022-09-20 22:48:36 -07:00
hosswald	5ed5c83aab	Clarified the behaviour of SQL COUNT(DISTINCT dim) on multi-value dimensions (#13128 ) * Clarified the behaviour of COUNT(DISTINCT column) on multi-value columns * Update docs/querying/sql-aggregations.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Vadim Ogievetsky <vadimon@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-09-20 18:03:34 -07:00
Vadim Ogievetsky	edc444a4bc	fix quickstart (#13126 )	2022-09-20 17:44:21 -07:00
Vadim Ogievetsky	b9edfe34a4	be consistent about referring to the web console by its name (#13118 )	2022-09-19 15:02:17 -07:00
Vadim Ogievetsky	bb0b810b1d	fix html tags in docs (#13117 ) * fix html tags in docs * revert not null	2022-09-18 19:40:33 -07:00
Gian Merlino	d9b2968edb	Docs: Clarify the situation with SELECT. (#13109 )	2022-09-17 10:47:57 -07:00
Charles Smith	b366a6c5a4	Add clarification around docker environment #8926 (#13084 ) * Add clarification around docker environment #8926 * fix spelling * Update docs/tutorials/docker.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update docs/tutorials/docker.md Co-authored-by: Frank Chen <frankchen@apache.org> * fix nano quickstart Co-authored-by: Frank Chen <frankchen@apache.org>	2022-09-17 20:44:24 +08:00
Gian Merlino	d4967c38f8	Various documentation updates. (#13107 ) * Various documentation updates. 1) Split out "data management" from "ingestion". Break it into thematic pages. 2) Move "SQL-based ingestion" into the Ingestion category. Adjust content so all conceptual content is in concepts.md and all syntax content is in reference.md. Shorten the known issues page to the most interesting ones. 3) Add SQL-based ingestion to the ingestion method comparison page. Remove the index task, since index_parallel is just as good when maxNumConcurrentSubTasks: 1. 4) Rename various mentions of "Druid console" to "web console". 5) Add additional information to ingestion/partitioning.md. 6) Remove a mention of Tranquility. 7) Remove a note about upgrading to Druid 0.10.1. 8) Remove no-longer-relevant task types from ingestion/tasks.md. 9) Move ingestion/native-batch-firehose.md to the hidden section. It was previously deprecated. 10) Move ingestion/native-batch-simple-task.md to the hidden section. It is still linked in some places, but it isn't very useful compared to index_parallel, so it shouldn't take up space in the sidebar. 11) Make all br tags self-closing. 12) Certain other cosmetic changes. 13) Update to node-sass 7. * make travis use node12 for docs Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com>	2022-09-16 21:58:11 -07:00
Vadim Ogievetsky	2493eb17bf	Doc fixes around msq (#13090 ) * remove things that do not apply * fix more things * pin node to a working version * fix * fixes * known issues tidy up * revert auto formatting changes * remove management-uis page which is 100% lies * don't mention the Coordinator console (that no longer exits) * goodies * fix typo	2022-09-16 02:15:26 -07:00
Katya Macedo	2218c8d23c	Documentation: Update spatial indexing example (#12555 ) * fix spatial indexing example * Update docs/development/geo.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/development/geo.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update text and example * Format JSON example * Update docs/development/geo.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/development/geo.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/development/geo.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/development/geo.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/development/geo.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/development/geo.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/development/geo.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/development/geo.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Accept review suggestions Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Frank Chen <frankchen@apache.org>	2022-09-16 10:32:19 +08:00
Atul Mohan	a8fd3a9077	Provide service specific log4j overrides in containerized deployments (#13020 ) * Provide service specific log4j overrides * Clarify comments * Add docs	2022-09-14 11:47:11 +08:00
Benedict Jin	4bde50e683	Bump the version of Druid docker image from 0.16.0-incubating to latest (#13058 )	2022-09-10 14:06:00 +05:30
Vadim Ogievetsky	4fc43670e5	adjust docs and images (#13067 )	2022-09-10 14:05:19 +05:30
DENNIS	dced61645f	prometheus-emitter supports sending metrics to pushgateway regularly … (#13034 ) * prometheus-emitter supports sending metrics to pushgateway regularly and continuously * spell check fix * Optimization variable name and related documents * Update docs/development/extensions-contrib/prometheus.md OK, it looks more conspicuous Co-authored-by: Frank Chen <frankchen@apache.org> * Update doc * Update docs/development/extensions-contrib/prometheus.md Co-authored-by: Frank Chen <frankchen@apache.org> * When PrometheusEmitter is closed, close the scheduler * Ensure that registeredMetrics is thread safe. * Local variable name optimization * Remove unnecessary white space characters Co-authored-by: Frank Chen <frankchen@apache.org>	2022-09-09 20:46:14 +08:00
sachidananda007	48c99054d0	Update tutorial-kafka.md (#13056 ) * Update tutorial-kafka.md Added missing command to the doc for zookeeper before starting kafka * Update docs/tutorials/tutorial-kafka.md Co-authored-by: Frank Chen <frankchen@apache.org>	2022-09-09 10:06:19 +08:00
Frank Chen	d57557d51d	Improve doc and configuration of prometheus emitter (#13028 ) * Improve doc and validation * Add configuration for peon tasks * Update doc * Update test case * Fix typo * Update docs/development/extensions-contrib/prometheus.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/development/extensions-contrib/prometheus.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2022-09-09 02:20:34 +08:00
Rohan Garg	7aa8d7f987	Add query/time metric for SQL queries from router (#12867 ) * Add query/time metric for SQL queries from router * Fix query cancel bug when user has overriden native query-id in a SQL query	2022-09-07 13:54:46 +05:30
Adam Peck	ee22663dd3	Add interpolation to JsonConfigurator (#13023 ) * Add interpolation to JsonConfigurator * Fix checkstyle * Fix tests by removing common-text override * Add back commons-text without version * Remove unused hadoopDir configs * Move some stuff to hopefully pass coverage	2022-09-07 12:48:01 +05:30
Clint Wylie	a3a377e570	more consistent expression error messages (#12995 ) * more consistent expression error messages * review stuff * add NamedFunction for Function, ApplyFunction, and ExprMacro to share common stuff * fixes * add expression transform name to transformer failure, better parse_json error messaging	2022-09-06 23:21:38 -07:00
Jill Osborne	1f69140623	Nested columns documentation (#12946 ) Co-authored-by: Clint Wylie <cjwylie@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: brian.le <brian.le@imply.io>	2022-09-06 14:42:18 -07:00
Vadim Ogievetsky	897689c03b	remove mentions of DruidQueryRel from docs (#13033 ) * remove mentions of DruidQueryRel * Update docs/querying/sql-translation.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql-translation.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-09-06 13:37:27 -07:00
317brian	d4233ef2a1	msq: add multi-stage-query docs (#12983 ) * msq: add multi-stage-query docs * add screenshots add back theta sketches tutoria change filename fix filename fix link fix headings * fixes * fixes * fix spelling issues and update spell file * address feedback from karan * add missing guardrail to known issues * update blurb * fix typo * remove durable storage info * update titles * Restore en.json * Update query view * address comments from vad * Update docs/multi-stage-query/msq-known-issues.md finish sentence * add apache license to docs * add apache license to docs Co-authored-by: Katya Macedo <katya.macedo@imply.io> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-09-06 23:06:09 +05:30
senthilkv	3d9aef225d	compressed big decimal - module (#10705 ) Compressed Big Decimal is an extension which provides support for Mutable big decimal value that can be used to accumulate values without losing precision or reallocating memory. This type helps in absolute precision arithmetic on large numbers in applications, where greater level of accuracy is required, such as financial applications, currency based transactions. This helps avoid rounding issues where in potentially large amount of money can be lost. Accumulation requires that the two numbers have the same scale, but does not require that they are of the same size. If the value being accumulated has a larger underlying array than this value (the result), then the higher order bits are dropped, similar to what happens when adding a long to an int and storing the result in an int. A compressed big decimal that holds its data with an embedded array. Compressed big decimal is an absolute number based complex type based on big decimal in Java. This supports all the functionalities supported by Java Big Decimal. Java Big Decimal is not mutable in order to avoid big garbage collection issues. Compressed big decimal is needed to mutate the value in the accumulator.	2022-09-06 00:06:57 -07:00
zemin	6805a7f9c2	Ease of hidding sensitive properties from /status/proper… (#12950 ) * apache#12063 Ease of hidding sensitive properties from /status/properties endpoint * apache#12063 Ease of hidding sensitive properties from /status/properties endpoint * apache#12063 Ease of hidding sensitive properties from /status/properties endpoint using one property for hiding properties, updated the index.md to document hiddenProperties * apache#12063 Ease of hidding sensitive properties from /status/properties endpoint Added java docs * apache#12063 Ease of hidding sensitive properties from /status/properties endpoint Add "password", "key", "token", "pwd" as default druid.server.hiddenProperties fixed typo and removed redundant space Co-authored-by: zemin <zemin.piao@adyen.com>	2022-09-02 08:51:25 -05:00
Gian Merlino	85d2a6d879	Improve range partitioning docs. (#13016 ) Two improvements: - Use a realistic targetRowsPerSegment, so if people copy and paste the example from the docs, it will generate reasonable segments. - Spell "countryName" correctly.	2022-09-01 15:21:30 -07:00
Gian Merlino	48ceab2153	Add Java 17 information to documentation. (#12990 ) The docs say Java 17 support is experimental, and give tips on running successfully with Java 17. This patch also removes java.base/jdk.internal.perf and jdk.management/com.sun.management.internal from the list of required exports and opens, because they were formerly needed for JvmMonitor, which was rewritten in #12481 to use MXBeans instead.	2022-08-30 12:32:49 -07:00
Clint Wylie	16f5ac5bd5	json_value adjustments (#12968 ) * json_value adjustments changes: * native json_value expression now has optional 3rd argument to specify type, which will cast all values to the specified type * rework how JSON_VALUE is wired up in SQL. Now we are using a custom convertlet to translate JSON_VALUE(... RETURNING type) into dedicated JSON_VALUE_BIGINT, JSON_VALUE_DOUBLE, JSON_VALUE_VARCHAR, JSON_VALUE_ANY instead of using the calcite StandardConvertletTable that wraps JSON_VALUE_ANY in a CAST, so that we preserve the typing of JSON_VALUE to pass down to the native expression as the 3rd argument * fix json_value_any to be usable by humans too, coverage * fix bug * checkstyle * checkstyle * review stuff * validate that options to json_value are the supported options rather than ignore them * remove more legacy undocumented functions	2022-08-27 07:15:47 -07:00
Alexander Saydakov	7e2371bbde	KLL sketch (#12498 ) * KLL sketch * added documentation * direct static refs * direct static refs * fixed test * addressed review points * added KLL sketch related terms * return a copy from get * Copy unions when returning them from "get". * Remove redundant "final". Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com> Co-authored-by: Gian Merlino <gianmerlino@gmail.com>	2022-08-26 21:19:24 -07:00
Adam Peck	21b73bde20	Update Curator to 5.3.0 (#12939 ) * Update Curator to 5.3.0 * Update licenses.yaml * Fix inspections + add tests. * Fix checkstyle * Another intellij inspection fix * Update curator exclusions * Cleanup new exhibitor references * Remove unused dep and checkstyle fix	2022-08-26 18:23:40 -07:00
Jill Osborne	7a1e1f88bb	Remove experimental note from stable features (#12973 ) * Removed experimental note for features that are no longer experimental * Updated native batch doc	2022-08-25 09:26:46 -07:00
Santosh Pingale	31dc9004bd	Auto-reload TLS certs for druid endpoints (#12933 ) * #12064 Auto-reload tls certs for druid endpoints * #12064 Add missing toString param * #12064 Add tests and new jks Co-authored-by: zemin-piao <pzm6391@gmail.com> * #12064 Refine tests * #12064 Add documentation * Apply suggestions from code review Co-authored-by: Frank Chen <frankchen@apache.org> Co-authored-by: santosh <santosh.pingale@adyen.com> Co-authored-by: Frank Chen <frankchen@apache.org>	2022-08-25 20:12:43 +08:00
Victoria Lim	02914c17b9	Tutorial on ingesting and querying Theta sketches (#12723 ) Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-08-24 09:23:22 -07:00
Karan Kumar	f7c6316992	Setting useNativeQueryExplain to true (#12936 ) * Setting useNativeQueryExplain to true * Update docs/querying/sql-query-context.md Co-authored-by: Santosh Pingale <pingalesantosh@gmail.com> * Fixing tests * Fixing broken tests Co-authored-by: Santosh Pingale <pingalesantosh@gmail.com>	2022-08-24 17:39:55 +05:30
Petar Petrov	6fec1d4c95	Add useNativeQueryExplain in sql query context documentation (#12924 ) (#12934 ) Co-authored-by: Petar Petrov <petar.petrov@system73.com>	2022-08-22 16:31:15 +05:30
AmatyaAvadhanula	379df5f103	Kinesis docs and logs improvements (#12886 ) Going ahead with the merge. CI is failing because of a code coverage change in the log line.	2022-08-22 14:49:42 +05:30
Rohan Garg	3c129f6728	Add sql planning time metric (#12923 )	2022-08-22 11:09:44 +05:30
Clint Wylie	f8097ccfaa	basic docs for nested column query functions (#12922 ) * basic docs for nested column query functions	2022-08-19 17:12:19 -07:00
Clint Wylie	69fe1f04e5	document virtualColumns in native query documentation, fix some redirects (#12917 ) * document virtualColumns in native query documentation, fix some redirects * after all that, forgot to run spellcheck locally * review stuff	2022-08-18 20:49:23 -07:00
Ian Roberts	770358dc34	Update tls-support.md (#12916 ) Fixing " lists all possible values for the configs belong" in TLS section	2022-08-18 09:46:30 +08:00
Peter Marshall	f665a0c077	Docs - Add links to Basic Tuning guide in process pages (#12741 ) Added link to the relevant section of the Basic Cluster Tuning page on each process page. This is in order to improve access to this information, which is not easy to find through search or nav.	2022-08-16 18:42:44 +05:30
Lucas Capistrant	ec8bdeb9f6	Document missing property - druid.announcer.skipSegmentAnnouncementOnZk (#12891 ) * document missing config related to segment announcement * improve wording * improve wording * update docs	2022-08-16 12:32:56 +05:30
Rohan Garg	5394838030	Enable conversion of join to filter by default (#12868 )	2022-08-13 20:37:43 +05:30
Lucas Capistrant	3a3271eddc	Introduce defaultOnDiskStorage config for Group By (#12833 ) * Introduce defaultOnDiskStorage config for groupBy * add debug log to groupby query config * Apply config change suggestion from review * Remove accidental new lines * update default value of new default disk storage config * update debug log to have more descriptive text * Make maxOnDiskStorage and defaultOnDiskStorage HumanRedadableBytes * improve test coverage * Provide default implementation to new default method on advice of reviewer	2022-08-12 09:40:21 -07:00
David Palmer	2855fb6ff8	Change Kafka Lookup Extractor to not register consumer group (#12842 ) * change kafka lookups module to not commit offsets The current behaviour of the Kafka lookup extractor is to not commit offsets by assigning a unique ID to the consumer group and setting auto.offset.reset to earliest. This does the job but also pollutes the Kafka broker with a bunch of "ghost" consumer groups that will never again be used. To fix this, we now set enable.auto.commit to false, which prevents the ghost consumer groups being created in the first place. * update docs to include new enable.auto.commit setting behaviour * update kafka-lookup-extractor documentation Provide some additional detail on functionality and configuration. Hopefully this will make it clearer how the extractor works for developers who aren't so familiar with Kafka. * add comments better explaining the logic of the code * add spelling exceptions for kafka lookup docs	2022-08-09 16:14:22 +05:30
絵空事スピリット	ebe783dbdc	Correct minor format issue (#12882 )	2022-08-09 18:15:41 +08:00
Hamish Ball	abd7a9748d	Remove kafka lookup records when a record is tombstoned (#12819 ) * remove kafka lookup records from factory when record tombstoned * update kafka lookup docs to include tombstone behaviour * change test wait time down to 10ms Co-authored-by: David Palmer <david.palmer@adscale.co.nz>	2022-08-09 10:42:51 +05:30
David Hergenroeder	533c39f35a	Fix rollup docs bullet formatting (#12876 )	2022-08-09 10:10:07 +08:00
Suneet Saldanha	267b32c2e2	Set druid.processing.fifo to true by default (#12571 )	2022-08-08 10:18:24 -07:00
Gian Merlino	01d555e47b	Adjust "in" filter null behavior to match "selector". (#12863 ) * Adjust "in" filter null behavior to match "selector". Now, both of them match numeric nulls if constructed with a "null" value. This is consistent as far as native execution goes, but doesn't match the behavior of SQL = and IN. So, to address that, this patch also updates the docs to clarify that the native filters do match nulls. This patch also updates the SQL docs to describe how Boolean logic is handled in addition to how NULL values are handled. Fixes #12856. * Fix test.	2022-08-08 09:08:36 -07:00
AmatyaAvadhanula	d294404924	Kinesis ingestion with empty shards (#12792 ) Kinesis ingestion requires all shards to have at least 1 record at the required position in druid. Even if this is satisified initially, resharding the stream can lead to empty intermediate shards. A significant delay in writing to newly created shards was also problematic. Kinesis shard sequence numbers are big integers. Introduce two more custom sequence tokens UNREAD_TRIM_HORIZON and UNREAD_LATEST to indicate that a shard has not been read from and that it needs to be read from the start or the end respectively. These values can be used to avoid the need to read at least one record to obtain a sequence number for ingesting a newly discovered shard. If a record cannot be obtained immediately, use a marker to obtain the relevant shardIterator and use this shardIterator to obtain a valid sequence number. As long as a valid sequence number is not obtained, continue storing the token as the offset. These tokens (UNREAD_TRIM_HORIZON and UNREAD_LATEST) are logically ordered to be earlier than any valid sequence number. However, the ordering requires a few subtle changes to the existing mechanism for record sequence validation: The sequence availability check ensures that the current offset is before the earliest available sequence in the shard. However, current token being an UNREAD token indicates that any sequence number in the shard is valid (despite the ordering) Kinesis sequence numbers are inclusive i.e if current sequence == end sequence, there are more records left to read. However, the equality check is exclusive when dealing with UNREAD tokens.	2022-08-05 22:38:58 +05:30
Katya Macedo	c6dd9dd4af	Fix typo in compaction.md (#12774 )	2022-08-04 14:47:22 -07:00
Gian Merlino	ef6811ef88	Improved Java 17 support and Java runtime docs. (#12839 ) * Improved Java 17 support and Java runtime docs. 1) Add a "Java runtime" doc page with information about supported Java versions, garbage collection, and strong encapsulation.. 2) Update asm and equalsverifier to versions that support Java 17. 3) Add additional "--add-opens" lines to surefire configuration, so tests can pass successfully under Java 17. 4) Switch openjdk15 tests to openjdk17. 5) Update FrameFile to specifically mention Java runtime incompatibility as the cause of not being able to use Memory.map. 6) Update SegmentLoadDropHandler to log an error for Errors too, not just Exceptions. This is important because an IllegalAccessError is encountered when the correct "--add-opens" line is not provided, which would otherwise be silently ignored. 7) Update example configs to use druid.indexer.runner.javaOptsArray instead of druid.indexer.runner.javaOpts. (The latter is deprecated.) * Adjustments. * Use run-java in more places. * Add run-java. * Update .gitignore. * Exclude hadoop-client-api. Brought in when building on Java 17. * Swap one more usage of java. * Fix the run-java script. * Fix flag. * Include link to Temurin. * Spelling. * Update examples/bin/run-java Co-authored-by: Xavier Léauté <xl+github@xvrl.net> Co-authored-by: Xavier Léauté <xl+github@xvrl.net>	2022-08-03 23:16:05 -07:00
Gian Merlino	2912a36a20	Use nonzero default value of maxQueuedBytes. (#12840 ) * Use nonzero default value of maxQueuedBytes. The purpose of this parameter is to prevent the Broker from running out of memory. The prior default is unlimited; this patch changes it to a relatively conservative 25MB. This may be too low for larger clusters. The risk is that throughput can decrease for queries with large resultsets or large amounts of intermediate data. However, I think this is better than the risk of the prior default, which is that these queries can cause the Broker to go OOM. * Alter calculation.	2022-08-02 17:57:27 -07:00
317brian	553ff47616	fix: fix broken link to Class TTest (#12836 )	2022-07-31 10:18:14 +08:00
Charles Smith	efbb58e90e	docs: remove maxRowsPerSegment where appropriate (#12071 ) * remove maxRowsPerSegment where appropriate * fix tutorial, accept suggestions * Update docs/design/coordinator.md * additional tutorial file * fix initial index spec * accept comments * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * add back comment on maxrows per segment * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * rm duplicate entry * Update native-batch-simple-task.md remove ref to `maxrowspersegment` * Update native-batch.md remove ref to `maxrowspersegment` * final tenticles * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-07-28 16:52:13 +05:30
Atul Mohan	93a9a4b1c5	Add retention for file request logs (#12559 ) * Add retention for file request logs * Spelling	2022-07-27 08:17:02 -07:00
Charles Smith	d7d4314367	remove ref to plywood repo (#12809 )	2022-07-26 10:12:13 +08:00
Victoria Lim	6394ecfd21	update figure and reference (#12813 )	2022-07-22 15:54:25 -07:00
Katya Macedo	a2be685824	Remove the time bit, fix headings (#12808 ) * Remove the time bit, fix headings * Adopt review suggestions * Edits * Update smoosh file description * Adopt review suggestions * Update spelling	2022-07-20 15:37:57 -07:00
Katya Macedo	809bf161ce	Add a note about setting the value of maxNumConcurrentSubTasks (#12772 ) * Add clarification for combining input source * Update inputFormat note * Update maxNumConcurrentSubTasks note * Fix broken link * Update docs/ingestion/native-batch-input-source.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-07-19 15:34:21 -07:00
Atul Mohan	75045970cd	S3 Ingestion from non-default endpoints (#11798 ) * Add endpoint support for s3inputsource * Changes to tests * Fix docs * Fix config * Fix inspections * Fix spelling * Remove password from toString	2022-07-15 11:03:34 -07:00
Jianhuan Liu	d4403c15aa	Upgrade prometheus version, add more labels to PrometheusEmitter (#12769 ) Changes: - Upgrade prometheus to version 0.16.0 - Add optional labels `druid_service` and `host_name` to `PrometheusEmitter`	2022-07-15 14:43:12 +05:30
Frank Chen	a544aff761	Document missed simple granularities (#12768 ) * Document missed simple granularities * Update docs/querying/granularities.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/granularities.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-07-14 14:02:28 +08:00
zachjsh	c0380e7b0a	* fix duplicate dimension (#12778 )	2022-07-14 10:39:03 +05:30
Victoria Lim	d8f8c56f94	Docs: Index page with all SQL functions (#12771 ) * list of all functions * add function names to spelling file	2022-07-14 09:59:55 +08:00
TSFenwick	8c02880d5f	Emit metrics for distribution of number of rows per segment (#12730 ) * initial commit of bucket dimensions for metrics return counts of segments that have rowcount in a bucket size for a datasource return average value of rowcount per segment in a datasource added unit test naming could use a lot of work buckets right now are not finalized added javadocs altered metrics.md * fix checkstyle issues * addressed review comments add monitor test move added functionality to new monitor update docs * address comments renamed monitor handle tombstones better update docs added javadocs * Add support for tombstones in the segment distribution * undo changes to tombstone segmentizer factory * fix accidental whitespacing changes * address comments regarding metrics documentation and rename variable to be more accurate * fix tests * fix checkstyle issues * fix broken test * undo removal of timeout	2022-07-12 07:04:42 -07:00
Gian Merlino	97207cdcc7	Automatic sizing for GroupBy dictionaries. (#12763 ) * Automatic sizing for GroupBy dictionary sizes. Merging and selector dictionary sizes currently both default to 100MB. This is not optimal, because it can lead to OOM on small servers and insufficient resource utilization on larger servers. It also invites end users to try to tune it when queries run out of dictionary space, which can make things worse if the end user sets it to too high. So, this patch: - Adds automatic tuning for selector and merge dictionaries. Selectors use up to 15% of the heap and merge buffers use up to 30% of the heap (aggregate across all queries). - Updates out-of-memory error messages to emphasize enabling disk spilling vs. increasing memory parameters. With the memory parameters automatically sized, it is more likely that an end user will get benefit from enabling disk spilling. - Removes the query context parameters that allow lowering of configured dictionary sizes. These complicate the calculation, and I don't see a reasonable use case for them. * Adjust tests. * Review adjustments. * Additional comment. * Remove unused import.	2022-07-11 08:20:50 -07:00
Jill Osborne	682ea7f32d	IMPLY-12348: Update description of UNION ALL in SQL syntax doc (#12710 ) * IMPLY-12348: Updated description of UNION ALL * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update sql.md * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-07-05 13:08:01 -07:00
Rui Chen	068bea6334	deps: upgrade mysql-connector-java to v5.1.49 (#12704 )	2022-06-29 23:15:46 +08:00
Didip Kerabat	6ddb828c7a	Able to filter Cloud objects with glob notation. (#12659 ) In a heterogeneous environment, sometimes you don't have control over the input folder. Upstream can put any folder they want. In this situation the S3InputSource.java is unusable. Most people like me solved it by using Airflow to fetch the full list of parquet files and pass it over to Druid. But doing this explodes the JSON spec. We had a situation where 1 of the JSON spec is 16MB and that's simply too much for Overlord. This patch allows users to pass {"filter": "*.parquet"} and let Druid performs the filtering of the input files. I am using the glob notation to be consistent with the LocalFirehose syntax.	2022-06-24 11:40:08 +05:30
Gian Merlino	d29343cbe3	Disable autokill of segments by default. (#12693 ) Also add clarifying commentary to the documentation about how durationToRetain works.	2022-06-23 17:17:11 -07:00
Tejaswini Bandlamudi	99e1b4efee	Update default value of `inputSegmentSizeBytes` in configuration docs (#12678 )	2022-06-22 09:05:03 +05:30
Gian Merlino	0099940808	Add TIME_IN_INTERVAL SQL operator. (#12662 ) * Add TIME_IN_INTERVAL SQL operator. The operator is implemented as a convertlet rather than an OperatorConversion, because this allows it to be equivalent to using the >= and < operators directly. * SqlParserPos cannot be null here. * Remove unused import. * Doc updates. * Add words to dictionary.	2022-06-21 13:05:37 -07:00
Jill Osborne	f050069767	Segments doc update (#12344 ) * Corrected heading levels in segments doc * IMPLY-18394: Updated Segments doc * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update segments.md * Updated links to changed headings in Segments doc * Corrected spelling error * Update segments.md Incorporated suggestions from Paul Rogers. * Update index.md * Update segments.md * Update segments.md * Update segments.md * Update compaction.md * Update docs/design/segments.md fix typo * Update docs/ingestion/compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-06-16 13:25:17 -07:00
Victoria Lim	94564b6ce6	Update screenshots for Druid console doc (#12593 ) * druid console doc updates * remove extra image * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> * updated screenshot labels Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-06-15 16:42:20 -07:00
Victoria Lim	353475bd36	Docs for automatic compaction (#12569 ) * docs for auto-compaction * fix broken links * another link * Apply suggestions from code review Co-authored-by: Suneet Saldanha <suneet@apache.org> * Apply suggestions from code review Co-authored-by: Suneet Saldanha <suneet@apache.org> * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Suneet Saldanha <suneet@apache.org> * reorg content for skipOffset * Update docs/ingestion/automatic-compaction.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Apply suggestions from code review Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> Co-authored-by: Suneet Saldanha <suneet@apache.org> Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2022-06-09 14:55:12 -07:00
Gian Merlino	a503683a4a	Add caching and CSP response headers. (#12609 ) * Add caching and CSP response headers. * Fix tests. * Fix checkstyle issues Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2022-06-04 21:46:49 +05:30
Victoria Lim	1506b26ce4	fix typo (#12607 )	2022-06-04 13:14:18 +08:00
Gian Merlino	a27f4f5740	Service stdout log files, move logs to log/. (#12570 ) * Service stdout log files, move logs to log/. Two changes that make log behavior cleaner: 1) Redirect messages from the Java runtime to their own log files. Otherwise, they would get jumbled up in the output of the all-in-one start command. 2) Use log/ instead of bin/log/ for the default log directory. Makes them easier to find. Additionally, add documentation about how to avoid the reflective access warnings in Java 11. * Spelling. * See if code formatting affects spelling.	2022-06-03 10:44:29 +05:30
Jill Osborne	9c8e6bb000	Addition to Multitenancy considerations doc (#12567 ) * Small addition to Multitenancy considerations doc * Update docs/querying/multitenancy.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update multitenancy.md Edit suggested by @kfaraz Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-06-02 10:32:14 -07:00
Dr. Sizzles	7291c92f4f	Adding zstandard compression library (#12408 ) * Adding zstandard compression library * 1. Took @clintropolis's advice to have ZStandard decompressor use the byte array when the buffers are not direct. 2. Cleaned up checkstyle issues. * Fixing zstandard version to latest stable version in pom's and updating license files * Removing zstd from benchmarks and adding to processing (poms) * fix the intellij inspection issue * Removing the prefix v for the version in the license check for ztsd * Fixing license checks Co-authored-by: Rahul Gidwani <r_gidwani@apple.com>	2022-05-28 17:01:44 -07:00
Agustin Gonzalez	2f3d7a4c07	Emit state of replace and append for native batch tasks (#12488 ) * Emit state of replace and append for native batch tasks * Emit count of one depending on batch ingestion mode (APPEND, OVERWRITE, REPLACE) * Add metric to compaction job * Avoid null ptr exc when null emitter * Coverage * Emit tombstone & segment counts * Tasks need a type * Spelling * Integrate BatchIngestionMode in batch ingestion tasks functionality * Typos * Remove batch ingestion type from metric since it is already in a dimension. Move IngestionMode to AbstractTask to facilitate having mode as a dimension. Add metrics to streaming. Add missing coverage. * Avoid inner class referenced by sub-class inspection. Refactor computation of IngestionMode to make it more robust to null IOConfig and fix test. * Spelling * Avoid polluting the Task interface * Rename computeCompaction methods to avoid ambiguous java compiler error if they are passed null. Other minor cleanup.	2022-05-23 12:32:47 -07:00
Gian Merlino	37853f8de4	ConcurrentGrouper: Add mergeThreadLocal option, fix bug around the switch to spilling. (#12513 ) * ConcurrentGrouper: Add option to always slice up merge buffers thread-locally. Normally, the ConcurrentGrouper shares merge buffers across processing threads until spilling starts, and then switches to a thread-local model. This minimizes memory use and reduces likelihood of spilling, which is good, but it creates thread contention. The new mergeThreadLocal option causes a query to start in thread-local mode immediately, and allows us to experiment with the relative performance of the two modes. * Fix grammar in docs. * Fix race in ConcurrentGrouper. * Fix issue with timeouts. * Remove unused import. * Add "tradeoff" to dictionary.	2022-05-21 10:28:54 -07:00
Katya Macedo	5073cee73f	Fix zookeeper spelling (#12556 )	2022-05-21 16:14:02 +08:00
Gian Merlino	65a1375b67	SQL: Add is_active to sys.segments, update examples and docs. (#11550 ) * SQL: Add is_active to sys.segments, update examples and docs. is_active is short for: (is_published = 1 AND is_overshadowed = 0) OR is_realtime = 1 It's important because this represents "all the segments that should be queryable, whether or not they actually are right now". Most of the time, this is the set of segments that people will want to look at. The web console already adds this filter to a lot of its queries, proving its usefulness. This patch also reworks the caveat at the bottom of the sys.segments section, so its information is mixed into the description of each result field. This should make it more likely for people to see the information. * Wording updates. * Adjustments for spellcheck. * Adjust IT.	2022-05-19 14:23:28 -07:00
Charles Smith	3e8d7a6d9f	Sql docs items (#12530 ) * touch up sql refactor * brush up SQL refactor * incorporate feedback * reorder sql * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-05-17 16:56:31 -07:00
Katya Macedo	177638f171	Fix typo, add comma (#12529 )	2022-05-17 16:42:47 -07:00
Gian Merlino	fdfecfd996	Improved docs for range partitioning. (#12350 ) * Improved docs for range partitioning. 1) Clarify the benefits of range partitioning. 2) Clarify which filters support pruning. 3) Include the fact that multi-value dimensions cannot be used for partitioning. * Additional clarification. * Update other section. * Another adjustment. * Updates from review.	2022-05-16 09:42:31 -07:00
Hellmar Becker	985640f103	Clarify the use of the Lookup API (#12088 ) * Update lookups.md * Update docs/querying/lookups.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/querying/lookups.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2022-05-16 07:50:24 -07:00
317brian	351e57bdb6	docs(fix): clarify how worker.version and minWorkerVersion comparison works (#12459 ) * docs(fix): clarify how worker.version and minWorkerVersion comparison works * Revert "docs(fix): clarify how worker.version and minWorkerVersion comparison works" This reverts commit `cadd1fdc60`. * docs(fix): clarify how worker.version and minWorkerVersion comparison works * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/configuration/index.md fix spelling Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-05-16 07:48:33 -07:00
Gian Merlino	5b6727f319	Enable vectorized virtual column processing by default. (#12520 ) In the majority of cases, this improves performance. There's only one case I'm aware of where this may be a net negative: for time_floor(__time, <period>) where there are many repeated __time values. In nonvectorized processing, SingleLongInputCachingExpressionColumnValueSelector implements an optimization to avoid computing the time_floor function on every row. There is no such optimization in vectorized processing. IMO, we shouldn't mention this in the docs. Rationale: It's too fiddly of a thing: it's not guaranteed that nonvectorized processing will be faster due to the optimization, because it would have to overcome the inherent speed advantage of vectorization. So it'd always require testing to determine the best setting for a specific dataset. It would be bad if users disabled vectorization thinking it would speed up their queries, and it actually slowed them down. And even if users do their own testing, at some point in the future we'll implement the optimization for vectorized processing too, and it's likely that users that explicitly disabled vectorization will continue to have it disabled. I'd like to avoid this outcome by encouraging all users to enable vectorization at all times. Really advanced users would be following development activity anyway, and can read this issue	2022-05-16 15:43:53 +05:30
Frank Chen	c33ff1c745	Enforce console logging for peon process (#12067 ) Currently all Druid processes share the same log4j2 configuration file located in _common directory. Since peon processes are spawned by middle manager process, they derivate the environment variables from the middle manager. These variables include those in the log4j2.xml controlling to which file the logger writes the log. But current task logging mechanism requires the peon processes to output the log to console so that the middle manager can redirect the console output to a file and upload this file to task log storage. So, this PR imposes this requirement to peon processes, whatever the configuration is in the shared log4j2.xml, peon processes always write the log to console.	2022-05-16 15:07:21 +05:30
Gian Merlino	ff253fd8a3	Add setProcessingThreadNames context parameter. (#12514 ) setting thread names takes a measurable amount of time in the case where segment scans are very quick. In high-QPS testing we found a slight performance boost from turning off processing thread renaming. This option makes that possible.	2022-05-16 13:42:00 +05:30
Lucas Capistrant	deb69d1bc0	Allow coordinator to be configured to kill segments in future (#10877 ) Allow a Druid cluster to kill segments whose interval_end is a date in the future. This can be done by setting druid.coordinator.kill.durationToRetain to a negative period. For example PT-24H would allow segments to be killed if their interval_end date was 24 hours or less into the future at the time that the kill task is generated by the system. A cluster operator can also disregard the druid.coordinator.kill.durationToRetain entirely by setting a new configuration, druid.coordinator.kill.ignoreDurationToRetain=true. This ignores interval_end date when looking for segments to kill, and instead is capable of killing any segment marked unused. This new configuration is off by default, and a cluster operator should fully understand and accept the risks if they enable it.	2022-05-11 07:35:15 +05:30
Kashif Faraz	60b4fa0f75	Docs: Fix column name in ingestion rollup doc (#12036 ) Fix the referred column name from "count" to "num_rows" as "count" vs. "COUNT(*)" might be a little confusing in this example.	2022-05-10 17:35:59 +05:30
Rohan Garg	75836a5a06	Add feature flag for sql planning of TimeBoundary queries (#12491 ) * Add feature flag for sql planning of TimeBoundary queries * fixup! Add feature flag for sql planning of TimeBoundary queries * Add documentation for enableTimeBoundaryPlanning * fixup! Add documentation for enableTimeBoundaryPlanning	2022-05-10 15:23:42 +05:30
Rohan Garg	2dd073c2cd	Pass metrics object for Scan, Timeseries and GroupBy queries during cursor creation (#12484 ) * Pass metrics object for Scan, Timeseries and GroupBy queries during cursor creation * fixup! Pass metrics object for Scan, Timeseries and GroupBy queries during cursor creation * Document vectorized dimension	2022-05-09 10:40:17 -07:00
Victoria Lim	0206a2da5c	Update automatic compaction docs with consistent terminology (#12416 ) * specify automatic compaction where applicable * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * update for style and consistency * implement suggested feedback * remove duplicate example * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/ingestion/compaction.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/operations/api-reference.md * update .spelling * Adopt review suggestions Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2022-05-03 16:22:25 -07:00
Rocky Chen	770ad95169	Add a metric for task duration in the pending queue (#12492 ) This PR is to measure how long a task stays in the pending queue and emits the value with the metric task/pending/time. The metric is measured in RemoteTaskRunner and HttpRemoteTaskRunner. An example of the metric: ``` 2022-04-26T21:59:09,488 INFO [rtr-pending-tasks-runner-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {"feed":"metrics","timestamp":"2022-04-26T21:59:09.487Z","service":"druid/coordinator","host":"localhost:8081","version":"2022.02.0-iap-SNAPSHOT","metric":"task/pending/time","value":8,"dataSource":"wikipedia","taskId":"index_parallel_wikipedia_gecpcglg_2022-04-26T21:59:09.432Z","taskType":"index_parallel"} ``` ------------------------------------------ Key changed/added classes in this PR Emit metric task/pending/time in classes RemoteTaskRunner and HttpRemoteTaskRunner. Update related factory classes and tests.	2022-05-02 23:47:25 -04:00
317brian	b97f273d5a	docs: fix typo (#12494 )	2022-05-01 22:44:31 +08:00
Charles Smith	42fa5c26e1	remove arbitrary granularity spec from docs (#12460 ) * remove arbitrary granularity spec from docs * Update docs/ingestion/ingestion-spec.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-04-28 16:36:54 -07:00
Gian Merlino	a2bad0b3a2	Reduce allocations due to Jackson serialization. (#12468 ) * Reduce allocations due to Jackson serialization. This patch attacks two sources of allocations during Jackson serialization: 1) ObjectMapper.writeValue and JsonGenerator.writeObject create a new DefaultSerializerProvider instance for each call. It has lots of fields and creates pressure on the garbage collector. So, this patch adds helper functions in JacksonUtils that enable reuse of SerializerProvider objects and updates various call sites to make use of this. 2) GroupByQueryToolChest copies the ObjectMapper for every query to install a special module that supports backwards compatibility with map-based rows. This isn't needed if resultAsArray is set and all servers are running Druid 0.16.0 or later. This release was a while ago. So, this patch disables backwards compatibility by default, which eliminates the need to copy the heavyweight ObjectMapper. The patch also introduces a configuration option that allows admins to explicitly enable backwards compatibility. * Add test. * Update additional call sites and add to forbidden APIs.	2022-04-27 14:17:26 -07:00
zachjsh	564d6defd4	Worker level task metrics (#12446 ) * * fix metric name inconsistency * * add task slot metrics for middle managers * * add new WorkerTaskCountStatsMonitor to report task count metrics from worker * * more stuff * * remove unused variable * * more stuff * * add javadocs * * fix checkstyle * * fix hadoop test failure * * cleanup * * add more code coverage in tests * * fix test failure * * add docs * * increase code coverage * * fix spelling * * fix failing tests * * remove dead code * * fix spelling	2022-04-26 11:44:44 -05:00
Peter Marshall	b47316b844	Update native-batch.md (#12478 ) Fixed indent on the Granularity Spec section and removed some superfluous tabbings.	2022-04-25 21:44:17 +08:00
Apoorv Gupta	4781af9921	Fix formatting in stats.md (#12470 ) * Fix formatting in stats.md * Update stats.md * Update docs/development/extensions-core/stats.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update docs/development/extensions-core/stats.md Co-authored-by: Frank Chen <frankchen@apache.org> Co-authored-by: Frank Chen <frankchen@apache.org>	2022-04-23 11:35:08 +08:00
Victoria Lim	63a993c33a	stringFirst and stringLast supported in ingestion (#12466 )	2022-04-22 10:28:49 +08:00
Victoria Lim	f95447070e	updated docs for sql query context (#12406 )	2022-04-21 11:19:39 -07:00
jacobtolar	0edc22179c	Document expression post-aggregators (#11896 ) * Document expression post-aggregators * Update docs/querying/post-aggregations.md Co-authored-by: Frank Chen <frankchen@apache.org> Co-authored-by: Frank Chen <frankchen@apache.org>	2022-04-19 10:36:19 +08:00
Victoria Lim	c86c48203e	recommendation for comparing strings and numbers (#12442 )	2022-04-18 09:28:32 -07:00
Peter Marshall	5167d328b1	Docs - query caching (#11584 ) * Update caching.md Knowledge from https://the-asf.slack.com/archives/CJ8D1JTB8/p1597781107153900 Update caching.md A few additional updates OTBO https://the-asf.slack.com/archives/CJ8D1JTB8/p1608669046041300 * Update caching.md Typos * Amendments on the segment cache Significant updates on content around the segment cache, pull process, and in-memory cache * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/operations/basic-cluster-tuning.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/operations/basic-cluster-tuning.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update basic-cluster-tuning.md typo * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Whole-query caching update Made more succinct and removed specific config to change. * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-04-18 17:00:21 +08:00
Charles Smith	408b46ae9f	Fixes a small typo in ingestion spec doc (#12143 ) * small typo * Update docs/ingestion/ingestion-spec.md Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: sthetland <steve.hetland@imply.io>	2022-04-18 16:53:50 +08:00
Peter Marshall	1201c9b2e5	Docs - added another common config property to tuningConfig (#11935 ) * Update ingestion-spec.md Added indexSpecForIntermediatePersists as a common configuration property. * Update ingestion-spec.md Amended to remove "below" and add link to the table. * Update ingestion-spec.md Removed passive.	2022-04-18 13:41:39 +08:00
Alexandre BERTHIOT	9f2b37f250	Update tutorial-compaction.md to change an unclear statement (#11988 ) * Update tutorial-compaction.md Unclear statement on the explanation of tuningConfig section. * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2022-04-18 13:25:09 +08:00
Maytas Monsereenusorn	5d37d9f9d8	Add docs to metric spec for auto compaction (#12415 ) * add docs * Update docs/configuration/index.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update index.md * Update docs/configuration/index.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-04-13 13:27:00 -07:00
Katya Macedo	f24e9c6862	Add Kinesis ListShards permission (#12387 ) * add Kinesis permission * List Kinesis IAM permissions * Adopt review suggestions * Fix merge conflicts	2022-04-13 15:29:56 +05:30
Parag Jain	2c79d28bb7	Copy of #11309 with fixes (#12402 ) * Optionally load segment index files into page cache on bootstrap and new segment download * Fix unit test failure * Fix test case * fix spelling * fix spelling * fix test and test coverage issues Co-authored-by: Jian Wang <wjhypo@gmail.com>	2022-04-11 21:05:24 +05:30
mark-imply	bf96ddf5ba	Update index.md (#12390 ) Added guidance on when to increase druid.indexer.storage.recentlyFinishedThreshold.	2022-04-08 18:01:54 +05:30
mark-imply	d98cbd90f0	Update basic-cluster-tuning.md (#12412 ) Changed "Other useful JVM flags" to "Other generally useful JVM flags" in order to align with the introduction to the doc.	2022-04-08 15:29:55 +05:30
317brian	d82a8185d1	fix(docs): clarify what s3 permissions are needed based on the access management type (#12405 ) * fix(docs): clarify what s3 permissions are needed based on the permissions model * fix typo * Update docs/development/extensions-core/s3.md Co-authored-by: Jihoon Son <jihoonson@apache.org> Co-authored-by: Jihoon Son <jihoonson@apache.org>	2022-04-07 16:22:56 -07:00
Victoria Lim	e6229b76a6	Document data format and example for featureSpec (#12394 ) * add data format and example for featureSpec * add second feature in example * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-04-06 15:17:15 -07:00
317brian	ac6c24793e	docs(fix): add clarity around granularitySpec (#12362 ) * fix: add clarify around granularitySpec * fix spacing * Update docs/ingestion/compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-04-06 09:24:37 -07:00
Victoria Lim	d326c681c1	Document config for ingesting null columns (#12389 ) * config for ingesting null columns * add link * edit .spelling * what happens if storeEmptyColumns is disabled	2022-04-05 09:15:42 -07:00
AmatyaAvadhanula	067254b778	Package kinesis client jar within the extension (#12370 ) amazon-kinesis-client was not covered undered the apache license and required separate insertion in the kinesis extension. This can now be avoided since it is covered, and including it within druid helps prevent incompatibilities. Allows enabling of deaggregation out of the box by packaging amazon-kinesis-client (1.14.4) with druid for kinesis ingestion.	2022-04-04 21:31:18 +05:30
Tejaswini Bandlamudi	984904779b	Increase default DatasourceCompactionConfig.inputSegmentSizeBytes to Long.MAX_VALUE (#12381 ) The current default value of inputSegmentSizeBytes is 400MB, which is pretty low for most compaction use cases. Thus most users are forced to override the default. The default value is now increased to Long.MAX_VALUE.	2022-04-04 16:28:53 +05:30
AmatyaAvadhanula	c5531be553	Add feature flag for Kinesis listShards API usage (#12383 ) listShards API was used to get all the shards for kinesis ingestion to improve its resiliency as part of #12161. However, this may require additional permissions in the IAM policy where the stream is present. (Please refer to: https://docs.aws.amazon.com/kinesis/latest/APIReference/API_ListShards.html). A dynamic configuration useListShards has been added to KinesisSupervisorTuningConfig to control the usage of this API and prevent issues upon upgrade. It can be safely turned on (and is recommended when using kinesis ingestion) by setting this configuration to true.	2022-04-04 14:58:10 +05:30
somu-imply	a1ea658115	Introducing a new config to ignore nulls while computing String Cardinality (#12345 ) * Counting nulls in String cardinality with a config * Adding tests for the new config * Wrapping the vectorize part to allow backward compatibility * Adding different tests, cleaning the code and putting the check at the proper position, handling hasRow() and hasValue() changes * Updating testcase and code * Adding null handling test to improve coverage * Checkstyle fix * Adding 1 more change in docs * Making docs clearer	2022-03-29 14:31:36 -07:00
Peter Marshall	f1841c6444	Docs - S3 masking and nav update to S3 page (#11490 ) * Docs: Masking S3 creds and some rewording Knowledge transfer from https://groups.google.com/g/druid-user/c/FydcpFrA688 * Removed bold in one of the quote sections * Update s3.md * Update s3.md Quick grammar change * Update docs/development/extensions-core/s3.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/s3.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/s3.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/s3.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/s3.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update s3.md Typo * Update docs/development/extensions-core/s3.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update s3.md Active lang * Update s3.md LAng nit * Update native-batch.md LAng nit * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Grammar tidy-up and link fix Corrected 2 x links to old page H2s, resolved the question around precedence, and some other grammatical changes. * Update docs/development/extensions-core/s3.md * Update s3.md Removed an Erroneous E Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-03-29 09:13:05 -07:00
Peter Marshall	b9a968e7ff	Docs – expressions link back and timestamp hint (#11674 ) * Update math-expr.md Link back to transformSpec * Update ingestion-spec.md Moved info about using the timestamp inside transforms into the actual timestamp section. * Update ingestion-spec.md Active language.	2022-03-29 09:12:30 -07:00
mark-imply	3c55565398	Update ingestion-spec.md (#12371 ) * Update ingestion-spec.md Added best practice point to dimensions description. * Update docs/ingestion/ingestion-spec.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-03-29 09:12:02 -07:00
Victoria Lim	9ed7aa33ec	Docs for request logging (#12363 ) * add docs for request logging * remove stray character * Update docs/operations/request-logging.md Co-authored-by: TSFenwick <tsfenwick@gmail.com> * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: TSFenwick <tsfenwick@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-03-28 14:09:41 -07:00
Adarsh Sanjeev	ef45a1551e	Convert inQueryThreshold into query context parameter. (#12357 ) Added Calcites InQueryThreshold as a query context parameter. Setting this parameter appropriately reduces the time taken for queries with large number of values in their IN conditions.	2022-03-22 18:33:57 +05:30
Frank Chen	d745d0b338	Add JDK 11 (#12333 )	2022-03-16 15:03:04 -07:00
Dr. Sizzles	69f928f50e	Adding k8s support for human readable parsing (#12316 ) * Adding k8s support for human readable parsing * Update docs/configuration/human-readable-byte.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update docs/configuration/human-readable-byte.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update core/src/main/java/org/apache/druid/java/util/common/HumanReadableBytes.java Co-authored-by: Frank Chen <frankchen@apache.org> * Changes per review Co-authored-by: Rahul Gidwani <r_gidwani@apple.com> Co-authored-by: Frank Chen <frankchen@apache.org>	2022-03-16 11:18:47 +08:00
AmatyaAvadhanula	7bf1d8c5c0	Facilitate lazy initialization of connections to mitigate overwhelming of Coordinator (#12298 ) Add config for eager / lazy connection initialization in ResourcePool Description Currently, when multiple tasks are launched, each of them eagerly initializes a full pool's worth of connections to the coordinator. While this is acceptable when the parameter for number of eagerConnections (== maxSize) is small, this can be problematic in environments where it's a large value (say 1000) and multiple tasks are launched simultaneously, which can cause a large number of connections to be created to the coordinator, thereby overwhelming it. Patch Nodes like the broker may require eager initialization of resources and do not create connections with the Coordinator. It is unnecessary to do this with other types of nodes. A config parameter eagerInitialization is added, which when set to true, initializes the max permissible connections when ResourcePool is initialized. If set to false, lazy initialization of connection resources takes place. NOTE: All nodes except the broker have this new parameter set to false in the quickstart as part of this PR Algorithm The current implementation relies on the creation of maxSize resources eagerly. The new implementation's behaviour is as follows: If a resource has been previously created and is available, lend it. Else if the number of created resources is less than the allowed parameter, create and lend it. Else, wait for one of the lent resources to be returned.	2022-03-09 23:17:43 +05:30
Agustin Gonzalez	abe76ccb90	Batch ingestion replace (#12137 ) * Tombstone support for replace functionality * A used segment interval is the interval of a current used segment that overlaps any of the input intervals for the spec * Update compaction test to match replace behavior * Adapt ITAutoCompactionTest to work with tombstones rather than dropping segments. Add support for tombstones in the broker. * Style plus simple queriableindex test * Add segment cache loader tombstone test * Add more tests * Add a method to the LogicalSegment to test whether it has any data * Test filter with some empty logical segments * Refactor more compaction/dropexisting tests * Code coverage * Support for all empty segments * Skip tombstones when looking-up broker's timeline. Discard changes made to tool chest to avoid empty segments since they will no longer have empty segments after lookup because we are skipping over them. * Fix null ptr when segment does not have a queriable index * Add support for empty replace interval (all input data has been filtered out) * Fixed coverage & style * Find tombstone versions from lock versions * Test failures & style * Interner was making this fail since the two segments were consider equal due to their id's being equal * Cleanup tombstone version code * Force timeChunkLock whenever replace (i.e. dropExisting=true) is being used * Reject replace spec when input intervals are empty * Documentation * Style and unit test * Restore test code deleted by mistake * Allocate forces TIME_CHUNK locking and uses lock versions. TombstoneShardSpec added. * Unused imports. Dead code. Test coverage. * Coverage. * Prevent killer from throwing an exception for tombstones. This is the killer used in the peon for killing segments. * Fix OmniKiller + more test coverage. * Tombstones are now marked using a shard spec * Drop a segment factory.json in the segment cache for tombstones * Style * Style + coverage * style * Add TombstoneLoadSpec.class to mapper in test * Update core/src/main/java/org/apache/druid/segment/loading/TombstoneLoadSpec.java Typo Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com> * Update docs/configuration/index.md Missing Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com> * Typo * Integrated replace with an existing test since the replace part was redundant and more importantly, the test file was very close or exceeding the 10 min default "no output" CI Travis threshold. * Range does not work with multi-dim Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>	2022-03-08 20:07:02 -07:00
Gian Merlino	875e0696e0	GroupBy: Cap dictionary-building selector memory usage. (#12309 ) * GroupBy: Cap dictionary-building selector memory usage. New context parameter "maxSelectorDictionarySize" controls when the per-segment processing code should return early and trigger a trip to the merge buffer. Includes: - Vectorized and nonvectorized implementations. - Adjustments to GroupByQueryRunnerTest to exercise this code in the v2SmallDictionary suite. (Both the selector dictionary and the merging dictionary will be small in that suite.) - Tests for the new config parameter. * Fix issues from tests. * Add "pre-existing" to dictionary. * Simplify GroupByColumnSelectorStrategy interface by removing one of the writeToKeyBuffer methods. * Adjustments from review comments.	2022-03-08 13:13:11 -08:00
Victoria Lim	903174de20	correct errors on compaction doc (#12308 )	2022-03-04 15:33:35 -08:00
Gian Merlino	3b373114dc	Officially support Java 11. (#12232 ) There aren't any changes in this patch that improve Java 11 compatibility; these changes have already been done separately. This patch merely updates documentation and explicit Java version checks. The log message adjustments in DruidProcessingConfig are there to make things a little nicer when running in Java 11, where we can't measure direct memory _directly_, and so we may auto-size processing buffers incorrectly.	2022-03-04 14:15:45 -08:00
Sandeep	61e1ffc7f7	add a new query laning metrics to visualize lane assignment (#12111 ) * add a new query laning metrics to visualize lane assignment * fixes :spotbugs check * Update docs/operations/metrics.md Co-authored-by: Benedict Jin <asdf2014@apache.org> * Update server/src/main/java/org/apache/druid/server/QueryScheduler.java Co-authored-by: Benedict Jin <asdf2014@apache.org> * Update server/src/main/java/org/apache/druid/server/QueryScheduler.java Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: Benedict Jin <asdf2014@apache.org>	2022-03-04 15:21:17 +08:00
Jihoon Son	e5ad862665	A new includeAllDimension flag for dimensionsSpec (#12276 ) * includeAllDimensions in dimensionsSpec * doc * address comments * unused import and doc spelling	2022-02-25 18:27:48 -08:00
Karan Kumar	b94390ba33	Adding Shared Access resource support for azure (#12266 ) Azure Blob storage has multiple modes of authentication. One of them is Shared access resource . This is very useful in cases when we do not want to add the account key in the druid properties .	2022-02-22 18:27:43 +05:30
Maytas Monsereenusorn	6e2eded277	Allow coordinator run auto compaction duty period to be configured separately from other indexing duties (#12263 ) * add impl * add impl * add unit tests * add impl * add impl * add serde test * add tests * add docs * fix test * fix test * fix docs * fix docs * fix spelling	2022-02-18 23:02:57 -08:00
Karan Kumar	5794331eb1	Adding new config for disabling group by on multiValue column (#12253 ) As part of #12078 one of the followup's was to have a specific config which does not allow accidental unnesting of multi value columns if such columns become part of the grouping key. Added a config groupByEnableMultiValueUnnesting which can be set in the query context. The default value of groupByEnableMultiValueUnnesting is true, therefore it does not change the current engine behavior. If groupByEnableMultiValueUnnesting is set to false, the query will fail if it encounters a multi-value column in the grouping key.	2022-02-16 20:53:26 +05:30
somu-imply	eae163a797	Moving in filter check to broker (#12195 ) * Moving in filter check to broker * Adding more unit tests, making error message meaningful * Spelling and doc changes * Updating default to -1 and making this feature hide by default. The number of IN filters can grow upto a max limit of 100 * Removing upper limit of 100, updated docs * Making documentation more meaningful * Moving check outside to PlannerConfig, updating test cases and adding back max limit * Updated with some additional code comments * Missed removing one line during the checkin * Addressing doc changes and one forbidden API correction * Final doc change * Adding a speling exception, correcting a testcase * Reading entire filter tree to address combinations of ANDs and ORs * Specifying in docs that, this case works only for ORs * Revert "Reading entire filter tree to address combinations of ANDs and ORs" This reverts commit `81ca8f8496`. * Covering a class cast exception and updating docs * Counting changed Co-authored-by: Jihoon Son <jihoonson@apache.org>	2022-02-15 20:45:07 -08:00
AmatyaAvadhanula	393e9b68a8	Add config to limit task slots for parallel indexing tasks (#12221 ) In extreme cases where many parallel indexing jobs are submitted together, it is possible that the `ParallelIndexSupervisorTasks` take up all slots leaving no slot to schedule their own sub-tasks thus stalling progress of all the indexing jobs. Key changes: - Add config `druid.indexer.runner.parallelIndexTaskSlotRatio` to limit the task slots for `ParallelIndexSupervisorTasks` per worker - `ratio = 1` implies supervisor tasks can use all slots on a worker if needed (default behavior) - `ratio = 0` implies supervisor tasks can not use any slot on a worker (actually, at least 1 slot is always available to ensure progress of parallel indexing jobs) - `ImmutableWorkerInfo.canRunTask()` - `WorkerHolder`, `ZkWorker`, `WorkerSelectUtils`	2022-02-15 23:15:09 +05:30
Victoria Lim	c61b19d443	Refactor SQL docs (#12239 ) * refactor and link fixes * add sql docs to left nav * code format for needle * updated web console script * link fixes * update earliest/latest functions * edits for grammar and style * more link fixes * another link * update with #12226 * update .spelling file	2022-02-11 14:43:30 -08:00
Clint Wylie	ae71e05fc5	array_concat_agg and array_agg support for array inputs (#12226 ) * array_concat_agg and array_agg support for array inputs changes: * added array_concat_agg to aggregate arrays into a single array * added array_agg support for array inputs to make nested array * added 'shouldAggregateNullInputs' and 'shouldCombineAggregateNullInputs' to fix a correctness issue with STRING_AGG and ARRAY_AGG when merging results, with dual purpose of being an optimization for aggregating * fix test * tie capabilities type to legacy mode flag about coercing arrays to strings * oops * better javadoc	2022-02-07 19:59:30 -08:00
Suneet Saldanha	ced1389d4c	Enable auto kill segments by default (#12187 ) * Enable auto-kill by default * tests * wip * test * fix IT * fix it * remove from docs * make coverage bot happy	2022-02-07 06:57:54 -08:00
Suneet Saldanha	159f97dcb0	Update docs for druid.processing.numThreads in brokers (#12231 ) * Update docs for druid.processing.numThreads * error msg * one more reference	2022-02-04 17:34:21 -08:00
Victoria Lim	24716bfedc	Doc updates for metadata cleanup and storage (#12190 ) * doc updates for metadata storage/cleanup * Add comments for disabling cleanup * Apply suggestions from code review * updated for https://github.com/apache/druid/pull/12201 * Apply suggestions from code review Co-authored-by: Maytas Monsereenusorn <maytasm@apache.org> * move retention period line earlier; more concise text * fix typo Co-authored-by: Maytas Monsereenusorn <maytasm@apache.org>	2022-01-27 11:40:54 -08:00
Maytas Monsereenusorn	fac6a48a8f	add impl (#12201 )	2022-01-27 11:39:59 -08:00
Suneet Saldanha	2b32d86f3b	Enable automatic metdata cleanup by default (#12188 )	2022-01-24 20:04:17 -08:00
Victoria Lim	d2ac146365	Docs for cluster tiering to improve query concurrency (#12128 ) * add new doc * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> * reorder query laning properties * rename doc * new name in doc header * organize material into "service tiering" section * text edits and update sidebars.json * update query laning * how queries get assigned to lanes * add more details to intro; use more consistent terminology * more content * Apply suggestions from code review Co-authored-by: Jihoon Son <jihoonson@apache.org> * Update docs/operations/mixed-workloads.md * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> * typo Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Jihoon Son <jihoonson@apache.org>	2022-01-15 12:22:08 +08:00
Jonathan Wei	74c876e578	Throw parse exceptions on schema get errors for SchemaRegistryBasedAvroBytesDecoder (#12080 ) * Add option to throw parse exceptions on schema get errors for SchemaRegistryBasedAvroBytesDecoder * Remove option	2022-01-13 12:36:51 -06:00
Clint Wylie	f2ce76966c	add EARLIEST_BY/LATEST_BY to make EARLIEST/LATEST function signatures less ambiguous (#12145 ) * add EARLIEST_BY/LATEST_BY to make EARLIEST/LATEST function signatures unambiguous * switcheroo * EARLIEST_BY/LATEST_BY use timestamp instead of numeric types, update docs * revert unintended change * fix docs * fix docs better	2022-01-12 03:48:53 -08:00
Vadim Ogievetsky	2299eb321e	Standardizing SQL function docs (#12091 ) * fix typos in SQL function docs * more code * Update docs/querying/sql.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update docs/querying/sql.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update docs/querying/sql.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update docs/querying/sql.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update docs/querying/sql.md Co-authored-by: Frank Chen <frankchen@apache.org> * a few more expr, fixes * more fixes * quote TIME_SHIFT * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * undo header change Co-authored-by: Frank Chen <frankchen@apache.org> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-01-06 23:57:03 -08:00
Jihoon Son	4a74c5adcc	Use Druid's extension loading for integration test instead of maven (#12095 ) * Use Druid's extension loading for integration test instead of maven * fix maven command * override config path * load input format extensions and kafka by default; add prepopulated-data group * all docker-composes are overridable * fix s3 configs * override config for all * fix docker_compose_args * fix security tests * turn off debug logs for overlord api calls * clean up stuff * revert docker-compose.yml * fix override config for query error test; fix circular dependency in docker compose * add back some dependencies in docker compose * new maven profile for integration test * example file filter	2022-01-05 23:33:04 -08:00
Victoria Lim	6846622080	Docs: add FILTER to sql query syntax (#12093 ) * docs: add FILTER to sql query syntax * Update docs/querying/sql.md * Update docs/querying/sql.md * Update docs/querying/sql.md * Update docs/querying/sql.md * move and update FILTER section	2022-01-05 12:59:41 -08:00
somu-imply	c267b65f97	Removing unused processing threadpool on broker (#12070 ) * Thread pool for broker * Updating two tests to improve coverage for new method added * Updating druidProcessingConfigTest to cover coverage * Adding missed spelling errors caused in doc * Adding test to cover lines of new function added	2021-12-21 13:07:53 -08:00
Victoria Lim	acbeae23b8	New doc for troubleshooting query execution (#12075 ) * new doc for troubleshooting query execution * add doc to sidebar * Apply suggestions from code review	2021-12-16 17:34:34 -08:00
Karan Kumar	377edff042	Ingestion metrics doc fix (#12066 ) * Ingestion metrics doc fix. * Fixing typo * Adding missed keywords in ignore list	2021-12-15 12:51:53 +05:30
Victoria Lim	4ede3bbff6	Docs updates (#12069 ) * minor updates to docs * remove en.json	2021-12-14 14:38:18 -08:00
Victoria Lim	e77bdfa70d	Document query context parameters related to join filters (#12057 ) * docs update for query context and filters * updates from review * Update docs/querying/filters.md	2021-12-13 17:47:21 -08:00
Lucas Capistrant	761fe9f144	Add new metric that quantifies how long batch ingest jobs waited for segment availability and whether or not that wait was successful (#12002 ) * add a unit test that tests that new metric is emitted * remove unused import * clarify in doc that this is for batch tasks * fix IndexTaskTest	2021-12-10 11:40:52 -06:00
Frank Chen	58245b4617	Support JsonPath functions in JsonPath expressions (#11722 ) * Add jsonPath functions support * Add jsonPath function test for Avro * Add jsonPath function length() to Orc * Add jsonPath function length() to Parquet * Add more tests to ORC format * update doc * Fix exception during ingestion * Add IT test case * Revert "Fix exception during ingestion" This reverts commit `5a5484b9ea`. * update IT test case * Add 'keys()' * Commit IT test case * Fix UT	2021-12-10 10:53:23 +08:00
shallada	25c9eba2f7	clarify time format for intervals (#12035 )	2021-12-08 08:31:21 -08:00
Lucas Capistrant	150902b95c	clean up the balancing code around the batched vs deprecated way of sampling segments to balance (#11960 ) * clean up the balancing code around the batched vs deprecated way of sampling segments to balance * fix docs, clarify comments, add deprecated annotations to legacy code * remove unused variable * update dynamic config dialog in console to state percentOfSegmentsToConsiderPerMove deprecated * fix dynamic config text for percentOfSegmentsToConsiderPerMove * run prettier to cleanup coordinator-dynamic-config.tsx changes * update jest snapshot * update documentation per review feedback	2021-12-07 14:47:46 -08:00
Clint Wylie	a8815f671e	Fix druid client timeout zero (#12023 ) * fix bug where queries fail immediately when timeout is 0 instead of using default timeout * fix to use serverside max * more better * less flaky test * oops	2021-12-07 12:41:01 -08:00
Peter Marshall	0b3f0bbbd8	Docs - Metrics docs layout and info about query/bytes (#11481 ) * Metrics docs layout and info about query/bytes Knowledge transfer from https://groups.google.com/g/druid-user/c/8fiflmSEoTQ - updated the layout of the Metrics part, adding links between docs pages. Update index.md Amended typo * Update docs/configuration/index.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/configuration/index.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/operations/metrics.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/operations/metrics.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/operations/metrics.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Feedback applied Http --> HTTP and moved content / removed > * Update docs/configuration/index.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/configuration/index.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-12-07 09:45:24 -08:00
Peter Marshall	c209db3a1d	Docs - roll-up tip (#11677 ) * Update rollup.md Added SE tip around roll-up. * Update docs/ingestion/rollup.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-12-07 09:17:36 -08:00
Peter Marshall	d7463c99e9	Docs - Task ref logs correction (#11746 ) * Update tasks.md Removed confusing backreference * Update tasks.md Changed silly grammar.	2021-12-07 09:15:19 -08:00
Jihoon Son	fc9513b6cd	Make NodeRole available during binding; add support for dynamic registration of DruidService (#12012 ) * Make nodeRole available during binding; add support for dynamic registration of DruidService * fix checkstyle and test * fix customRole test * address comments * add more javadoc	2021-12-03 11:59:00 -08:00
jacobtolar	f7f5505631	Add avro_ocf to supported Kafka/Kinesis InputFormats (#11865 ) * Update docs - Kinesis InputFormat ingestion * Add avro_ocf to list of supported Kafka InputFormats * Remove extra whitespace. * Update kafka-supervisor-reference.md * Delete extra whitespace.	2021-12-03 07:57:26 -08:00
Frank Chen	4631a66723	Support rolling log files (#10147 ) * apply log file rolling strategy * fix doc Signed-off-by: frank chen <frank.chen021@outlook.com> * Use absolute log path and allow spaces in log path * Update log4j2 configuration * apply FileAppender to ZooKeeper * DO NOT redirect application's console log to file in supervisor	2021-12-03 21:32:01 +08:00
Charles Smith	7ed46800c3	Docs: Add multi-dimension partitioning doc; refactor native batch and separate into smaller topics. (#11983 ) Adds documentation for multi-dimension partitioning. cc: @kfaraz Refactors the native batch partitioning topic as follows: Native batch ingestion covers parallel-index Native batch simple task indexing covers index Native batch input sources covers ioSource Native batch ingestion with firehose covers deprecated firehose	2021-12-03 16:37:14 +05:30
Clint Wylie	84b4bf56d8	vectorize logical operators and boolean functions (#11184 ) changes: * adds new config, druid.expressions.useStrictBooleans which make longs the official boolean type of all expressions * vectorize logical operators and boolean functions, some only if useStrictBooleans is true	2021-12-02 16:40:23 -08:00
benkrug	11746b8536	Update datasketches-hll.md (#12010 ) under "Aggregators", about the lgK setting, it said "Must be a power of 2 from 4 to 21 inclusively." 21 is not a power of 2, nor is 12, the given default. I think there may have been confusion because lgK represents log2 of K. We could say "K must be a power of 2...", or just say lgK must be between 4 and 21.	2021-11-30 18:52:00 -08:00
Charles Smith	f536f31229	clarify avro support & general style improvements (#11975 ) * clarify avro support & general style improvements * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update avro.md remove redundancy Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2021-11-28 16:10:18 +08:00
Laksh Singla	c381cae51b	Improve the output of SQL explain message (#11908 ) Currently, when we try to do EXPLAIN PLAN FOR, it returns the structure of the SQL parsed (via Calcite's internal planner util), which is verbose (since it tries to explain about the nodes in the SQL, instead of the Druid Query), and not representative of the native Druid query which will get executed on the broker side. This PR aims to change the format when user tries to EXPLAIN PLAN FOR for queries which are executed by converting them into Druid's native queries (i.e. not sys schemas).	2021-11-25 21:08:33 +05:30
Rohan Garg	2c08055962	Specify time column for first/last aggregators (#11949 ) Add the ability to pass time column in first/last aggregator (and latest/earliest SQL functions). It is to support cases where the time to query upon is stored as a part of a column different than __time. Also, some other logical time column can be specified.	2021-11-25 09:44:14 +05:30
Maytas Monsereenusorn	bb3d2a433a	Support filtering data in Auto Compaction (#11922 ) * add impl * fix checkstyle * add test * add test * add unit tests * fix unit tests * fix unit tests * fix unit tests * add IT * add IT * add comments * fix spelling	2021-11-24 10:56:38 -08:00
Kashif Faraz	6607e4cc75	Docs: Remove reference to deprecated field `targetPartitionSize` (#11974 ) * Remove reference to deprecated field `targetPartitionSize` * Fix spelling of LeaderLatch	2021-11-23 15:32:16 +05:30
Peter Marshall	ed0606db69	Docs - Corrected admonition issue (#11926 ) * Corrected admonition issue * Update data-formats.md Removed all admonition bits, and took out sf linebreaks. * Update data-formats.md Changed the shocker line into something a little more practical.	2021-11-22 12:14:30 -08:00
Katya Macedo	706d057ccc	corrected leaderlatch name (#11966 )	2021-11-22 11:58:42 -08:00
jacobtolar	0a9a908031	Add inline native query example to tutorial (#11642 ) * Add inline native query example to tutorial Minor change to the tutorial that adds an example of a native HTTP query request body, and adds a link to the more detailed "native query over HTTP" documentation. * cleanup * Apply suggestions from code review. Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: sthetland <steve.hetland@imply.io>	2021-11-22 21:35:05 +08:00
Peter Marshall	0c0001579d	Update compaction.md (#11937 ) Removed superfluous tabs that caused issues in rendering Added nav to the `inputSpec`	2021-11-22 21:33:47 +08:00
jacobtolar	3aee5d9ec3	Fix: invalid JSON in ingestion spec doc example (#11880 ) * Fix: invalid JSON in ingestion spec doc example * Update ingestion-spec.md	2021-11-22 21:33:26 +08:00
Nikhil Navadiya	3c51136098	Add worker category dimension (#11554 ) * Add worker category as dimension in TaskSlotCountStatsMonitor * Change description * Add workerConfig as field * Modify HttpRemoteTaskRunnerTest to test worker category in taskslot metrics * Fixing tests * Fixing alerts * Adding unit test in SingleTaskBackgroundRunnerTest for task slot metrics APIs * Resolving false positive spell check * addressing comments * throw UnsupportedOperationException for tasklotmetrics APIs in SingleTaskBackgroundRunner Co-authored-by: Nikhil Navadiya <nnavadiya@twitter.com>	2021-11-18 22:59:07 -08:00
somu-imply	29710789a4	Adding safe divide function (#11904 ) * IMPLY-4344: Adding safe divide function along with testcases and documentation updates * Changing based on review comments * Addressing review comments, fixing coding style, docs and spelling * Checkstyle passes for all code * Fixing expected results for infinity * Revert "Fixing expected results for infinity" This reverts commit `5fd5cd480d`. * Updating test result and a space in docs	2021-11-17 08:22:41 -08:00
TSFenwick	1487f558b1	Use a simple class to sanitize JDBC exceptions and also log them (#11843 ) * Use a simple class to sanitize sanitizable errors and log them The purpose of this is to sanitize JDBC errors, but can sanitize other errors if they implement SanitizableError Interface add a class to log errors and sanitize them added a simple test that tests out that the error gets sanitized add @NonNull annotation to serverconfig's ErrorResponseTransfromStrategy * return less information as part of too many connections, and instead only log specific details This is so an end user gets relevant information but not too much info since they might now how many brokers they have * return only runtime exceptions added new error types that need to be sanitized also sanitize deprecated and unsupported exceptions. * dont reqrewite exceptions unless necessary for checked exceptions add docs avoid blanket turning all exceptions into runtime exceptions * address comments, to fix up docs. add more javadocs add support UOE sanitization * use try catch instead and sanitize at public methods * checkstyle fixes * throw noSuchStatement and NoSuchConnection as Avatica is affected by those * address comments. move log error back to druid meta clean up bad formatting and commented code. add missed catch for NoSuchStatementException clean up comments for error handler and add comment explainging not wanting to santize avatica exceptions * alter test to reflect new error message	2021-11-16 13:13:03 -08:00
sthetland	02b578a3dd	Fixing a few typos and style issues (#11883 ) * grammar and format work * light writing touchup Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-11-16 10:13:35 -08:00
Gian Merlino	6f6e88e02e	SQL: Add type headers to response formats. (#11914 ) This allows clients to interpret the results of SQL queries without having to guess types.	2021-11-13 11:30:57 +05:30
Jihoon Son	f91868602d	Remove stale warning for HTTP inputSource (#11907 )	2021-11-13 10:27:14 +08:00
Charles Smith	33a5cda061	Docs: Splits Kafka topic. Adds detailed example for kafka inputFormat (#11912 ) * Splits Kafka topic according to function. Adds detailed example for kafka inputFormat * Apply suggestions from code review accept suggestions from review Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review accept suggestions Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * accept suggestions * accept suggestions * final typos and clarifications * bringing forward some syntax fixes Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2021-11-12 13:02:23 -08:00
Clint Wylie	5baa22148e	revert ColumnAnalysis type, add typeSignature and use it for DruidSchema (#11895 ) * revert ColumnAnalysis type, add typeSignature and use it for DruidSchema * review stuffs * maybe null * better maybe null * Update docs/querying/segmentmetadataquery.md * Update docs/querying/segmentmetadataquery.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * fix null right * sad * oops * Update batch_hadoop_queries.json Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-11-10 18:46:29 -08:00
Maytas Monsereenusorn	a36a41da73	Support routing data through an HTTP proxy (#11891 ) * Support routing data through an HTTP proxy * Support routing data through an HTTP proxy This adds the ability for the HttpClient to connect through an HTTP proxy. We augment the channel factory to check if it is supposed to be proxied and, if so, we connect to the proxy host first, issue a CONNECT command through to the final recipient host and then give the channel to the normal http client for usage. * add docs * address comments Co-authored-by: imply-cheddar <86940447+imply-cheddar@users.noreply.github.com>	2021-11-09 17:24:06 -08:00
Maytas Monsereenusorn	ddc68c6a81	Support changing dimension schema in Auto Compaction (#11874 ) * add impl * add unit tests * fix checkstyle * add impl * add impl * add impl * add impl * add impl * add impl * fix test * add IT * add IT * fix docs * add test * address comments * fix conflict	2021-11-08 21:17:08 -08:00
Jian Wang	8e7e679984	Add more metrics for Jetty server thread pool usage (#11113 ) Add more metrics for jetty server thread pool usage so we know if we have allocated enough http threads to handle requests.	2021-11-07 16:51:44 +05:30
zachjsh	1d6df48145	Warn if cache size of lookup is beyond max size (#11863 ) Enhanced the ExtractionNamespace interface in lookups-cached-global core extension with the ability to set a maxHeapPercentage for the cache of the respective namespace. The reason for adding this functionality, is make it easier to detect when a lookup table grows to a size that the underlying service cannot handle, because it does not have enough memory. The default value of maxHeap for the interface is -1, which indicates that no maxHeapPercentage has been set. For the JdbcExtractionNamespace and UriExtractionNamespace implementations, the default value is null, which will cause the respective service that the lookup is loaded in, to warn when its cache is beyond mxHeapPercentage of the service's configured max heap size. If a positive non-null value is set for the namespace's maxHeapPercentage config, this value will be honored for all services that the respective lookup is loaded onto, and consequently log warning messages when the cache of the respective lookup grows beyond this respective percentage of the services configured max heap size. Warnings are logged every time that either Uri based or Jdbc based lookups are regenerated, if the maxHeapPercentage constraint is violated. No other implementations will log warnings at this time. No error is thrown when the size exceeds the maxHeapPercentage at this time, as doing so could break functionality for existing users. Previously the JdbcCacheGenerator generated its cache by materializing all rows of the underling table in memory at once; this made it difficult to log warning messages in the case that the results from the jdbc query were very large and caused the service to run out of memory. To help with this, this pr makes it so that the jdbc query results are instead streamed through an iterator.	2021-11-03 21:32:22 -04:00
Kashif Faraz	a22687ecbe	Add Broker config `druid.broker.segment.watchRealtimeNodes` (#11732 ) The new config is an extension of the concept of "watchedTiers" where the Broker can choose to add the info of only the specified tiers to its timeline. Similarly, with this config, Broker can choose to skip the realtime nodes and thus it would query only Historical processes for any given segment.	2021-11-02 12:38:42 +05:30
Katya Macedo	5e1dc843d1	Fix quickstart link (#11864 )	2021-11-02 13:27:53 +08:00
Maytas Monsereenusorn	ba2874ee1f	Support changing query granularity in Auto Compaction (#11856 ) * add queryGranularity * fix checkstyle * fix test	2021-11-01 15:18:44 -07:00
Karan Kumar	90640bb316	Support for hadoop 3 via maven profiles (#11794 ) Add support for hadoop 3 profiles . Most of the details are captured in #11791 . We use a combination of maven profiles and resource filtering to achieve this. Hadoop2 is supported by default and a new maven profile with the name hadoop3 is created. This will allow the user to choose the profile which is best suited for the use case.	2021-10-30 22:46:24 +05:30
Maytas Monsereenusorn	33d9d9bd74	Add rollup config to auto and manual compaction (#11850 ) * add rollup to auto and manual compaction * add unit tests * add unit tests * add IT * fix checkstyle	2021-10-29 10:22:25 -07:00
Gian Merlino	fc95c92806	Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs. (#11124 ) * Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs. This patch does the following: - Removes OffheapIncrementalIndex. - Clarifies that Aggregators are required to be thread safe. - Clarifies that BufferAggregators and VectorAggregators are not required to be thread safe. - Removes thread safety code from some DataSketches aggregators that had it. (Not all of them did, and that's OK, because it wasn't necessary anyway.) - Makes enabling "useOffheap" with groupBy v1 an error. Rationale for removing the offheap incremental index: - It is only used in one rare scenario: groupBy v1 (which is non-default) in "useOffheap" mode (also non-default). So you have to go pretty deep into the wilderness to get this code to activate in production. It is never used during ingestion. - Its existence complicates developer efforts to reason about how aggregators get used, because the way it uses buffer aggregators is so different from how every other query engine uses them. - It doesn't have meaningful testing. By the way, I do believe that the given way the offheap incremental index works, it actually didn't require buffer aggregators to be thread-safe. It synchronizes on "aggregate" and doesn't call "get" until it has stopped calling "aggregate". Nevertheless, this is a bother to think about, and for the above reasons I think it makes sense to remove the code anyway. * Remove things that are now unused. * Revert removal of getFloat, getLong, getDouble from BufferAggregator. * OAK-related warnings, suppressions. * Unused item suppressions.	2021-10-26 08:05:56 -07:00
Sergio Ferragut	000a5551fa	docker mem reqs (#11827 ) * docker mem reqs * Update docs/tutorials/docker.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Sergio Ferragut <sergio.ferragut@imply.io> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-10-25 12:23:25 -07:00
Gian Merlino	8276c031c5	Add druid.sql.approxCountDistinct.function property. (#11181 ) * Add druid.sql.approxCountDistinct.function property. The new property allows admins to configure the implementation for APPROX_COUNT_DISTINCT and COUNT(DISTINCT expr) in approximate mode. The motivation for adding this setting is to enable site admins to switch the default HLL implementation to DataSketches. For example, an admin can set: druid.sql.approxCountDistinct.function = APPROX_COUNT_DISTINCT_DS_HLL * Fixes * Fix tests. * Remove erroneous cannotVectorize. * Remove unused import. * Remove unused test imports.	2021-10-25 12:16:21 -07:00
Kashif Faraz	abac9e39ed	Revert permission changes to Supervisor and Task APIs (#11819 ) * Revert "Require Datasource WRITE authorization for Supervisor and Task access (#11718)" This reverts commit `f2d6100124`. * Revert "Require DATASOURCE WRITE access in SupervisorResourceFilter and TaskResourceFilter (#11680)" This reverts commit `6779c4652d`. * Fix docs for the reverted commits * Fix and restore deleted tests * Fix and restore SystemSchemaTest	2021-10-25 14:50:38 +05:30
Charles Smith	10c5fa93f1	remove dupe sentence (#11821 )	2021-10-25 14:48:20 +05:30
Victoria Lim	43103632fb	Docs - add description on time origin (#11826 ) * add description on time origin * reorder parameter descriptions * add example of origin value	2021-10-22 14:57:13 -07:00
Charles Smith	938c1493e5	edits to kafka inputFormat (#11796 ) * edits to kafka inputFormat * revise conflict resolution description * tweak for clarity * Update docs/ingestion/data-formats.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * style fixes * Update docs/ingestion/data-formats.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2021-10-15 14:01:10 -07:00
Charles Smith	6089a168ea	Docs - update dynamic config provider topic (#11795 ) * update dynamic config provider * update topic * add examples for dynamic config provider: * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/operations/dynamic-config-provider.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/operations/dynamic-config-provider.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/operations/dynamic-config-provider.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/operations/dynamic-config-provider.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/operations/dynamic-config-provider.md Co-authored-by: Clint Wylie <cjwylie@gmail.com> * Update docs/operations/dynamic-config-provider.md Co-authored-by: Clint Wylie <cjwylie@gmail.com> * Update kafka-ingestion.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2021-10-14 17:51:32 -07:00
Arun Ramani	b6b42d3936	Minor processor quota computation fix + docs (#11783 ) * cpu/cpuset cgroup and procfs data gathering * Renames and default values * Formatting * Trigger Build * Add cgroup monitors * Return 0 if no period * Update * Minor processor quota computation fix + docs * Address comments * Address comments * Fix spellcheck Co-authored-by: arunramani-imply <84351090+arunramani-imply@users.noreply.github.com>	2021-10-08 22:52:03 -05:00
Victoria Lim	42e44269be	Docs update for druid-basic-security (#11782 ) * update druid-basic-security * typo * revisions from review	2021-10-08 14:45:09 -07:00
Kashif Faraz	c2c724c065	Fix docs to explain that WRITE permissions do not include READ (#11785 ) * Fix docs to explain that WRITE and READ are exclusive * Fix indentation * Use suggested doc style	2021-10-08 14:10:20 -07:00
Charles Smith	3ecbd3aec4	docs for changes to authorization in #11718 and #11720 (#11779 ) * security recommendation * Update docs/operations/security-overview.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/operations/security-user-auth.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/operations/security-user-auth.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update security-user-auth.md add newline * Update docs/operations/security-overview.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update security-overview.md add suggestion for environment variable dynamic config provider Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Clint Wylie <cwylie@apache.org>	2021-10-08 14:04:04 -07:00
Kashif Faraz	f2d6100124	Require Datasource WRITE authorization for Supervisor and Task access (#11718 ) Follow up PR for #11680 Description Supervisor and Task APIs are related to ingestion and must always require Datasource WRITE authorization even if they are purely informative. Changes Check Datasource WRITE in SystemSchema for tables "supervisors" and "tasks" Check Datasource WRITE for APIs /supervisor/history and /supervisor/{id}/history Check Datasource for all Indexing Task APIs	2021-10-08 10:39:48 +05:30
Katya Macedo	45d0ecbefb	clarify hadoop input paths (#11781 ) Co-authored-by: Katya Macedo <katya.macedo@imply.io>	2021-10-07 20:22:51 -07:00
lokesh-lingarajan	ad6609a606	Kafka Input Format for headers, key and payload parsing (#11630 ) ### Description Today we ingest a number of high cardinality metrics into Druid across dimensions. These metrics are rolled up on a per minute basis, and are very useful when looking at metrics on a partition or client basis. Events is another class of data that provides useful information about a particular incident/scenario inside a Kafka cluster. Events themselves are carried inside kafka payload, but nonetheless there are some very useful metadata that is carried in kafka headers that can serve as useful dimension for aggregation and in turn bringing better insights. PR(https://github.com/apache/druid/pull/10730) introduced support of Kafka headers in InputFormats. We still need an input format to parse out the headers and translate those into relevant columns in Druid. Until that’s implemented, none of the information available in the Kafka message headers would be exposed. So first there is a need to write an input format that can parse headers in any given format(provided we support the format) like we parse payloads today. Apart from headers there is also some useful information present in the key portion of the kafka record. We also need a way to expose the data present in the key as druid columns. We need a generic way to express at configuration time what attributes from headers, key and payload need to be ingested into druid. We need to keep the design generic enough so that users can specify different parsers for headers, key and payload. This PR is designed to solve the above by providing wrapper around any existing input formats and merging the data into a single unified Druid row. Lets look at a sample input format from the above discussion "inputFormat": { "type": "kafka", // New input format type "headerLabelPrefix": "kafka.header.", // Label prefix for header columns, this will avoid collusions while merging columns "recordTimestampLabelPrefix": "kafka.", // Kafka record's timestamp is made available in case payload does not carry timestamp "headerFormat": // Header parser specifying that values are of type string { "type": "string" }, "valueFormat": // Value parser from json parsing { "type": "json", "flattenSpec": { "useFieldDiscovery": true, "fields": [...] } }, "keyFormat": // Key parser also from json parsing { "type": "json" } } Since we have independent sections for header, key and payload, it will enable parsing each section with its own parser, eg., headers coming in as string and payload as json. KafkaInputFormat will be the uber class extending inputFormat interface and will be responsible for creating individual parsers for header, key and payload, blend the data resolving conflicts in columns and generating a single unified InputRow for Druid ingestion. "headerFormat" will allow users to plug parser type for the header values and will add default header prefix as "kafka.header."(can be overridden) for attributes to avoid collision while merging attributes with payload. Kafka payload parser will be responsible for parsing the Value portion of the Kafka record. This is where most of the data will come from and we should be able to plugin existing parser. One thing to note here is that if batching is performed, then the code is augmenting header and key values to every record in the batch. Kafka key parser will handle parsing Key portion of the Kafka record and will ingest the Key with dimension name as "kafka.key". ## KafkaInputFormat Class: This is the class that orchestrates sending the consumerRecord to each parser, retrieve rows, merge the columns into one final row for Druid consumption. KafkaInputformat should make sure to release the resources that gets allocated as a part of reader in CloseableIterator<InputRow> during normal and exception cases. During conflicts in dimension/metrics names, the code will prefer dimension names from payload and ignore the dimension either from headers/key. This is done so that existing input formats can be easily migrated to this new format without worrying about losing information.	2021-10-07 08:56:27 -07:00
Charles Smith	8fd17fe0af	fix a few typos in Kinesis doc (#11776 )	2021-10-06 19:43:20 -07:00
Lucas Capistrant	1930ad1f47	Implement configurable internally generated query context (#11429 ) * Add the ability to add a context to internally generated druid broker queries * fix docs * changes after first CI failure * cleanup after merge with master * change default to empty map and improve unit tests * add doc info and fix checkstyle * refactor DruidSchema#runSegmentMetadataQuery and add a unit test	2021-10-06 09:02:41 -07:00
Kashif Faraz	b688db790b	Add Broker config `druid.broker.segment.ignoredTiers` (#11766 ) The new config is an extension of the concept of "watchedTiers" where the Broker can choose to add the info of only the specified tiers to its timeline. Similarly, with this config, Broker can choose to ignore the segments being served by the specified historical tiers. By default, no tier is ignored. This config is useful when you want a completely isolated tier amongst many other tiers. Say there are several tiers of historicals Tier T1, Tier T2 ... Tier Tn and there are several brokers Broker B1, Broker B2 .... Broker Bm If we want only Broker B1 to query Tier T1, instead of setting a long list of watchedTiers on each of the other Brokers B2 ... Bm, we could just set druid.broker.segment.ignoredTiers=["T1"] for these Brokers, while Broker B1 could have druid.broker.segment.watchedTiers=["T1"]	2021-10-06 10:06:32 +05:30
Frank Chen	104c9a07f0	Fix broken anchor and heading levels in Kafka/Kinesis ingestion (#11748 ) * Fix broken anchor and heading levels * Fix CI	2021-10-05 19:30:50 -07:00
Charles Smith	621e5ac63f	docs: clarify RealtimeMetricsMonitor, HistoricalMetricsMonitor (#11565 ) * docs: clarify RealtimeMetricsMonitor, HistoricalMetricsMonitor * Update docs/configuration/index.md	2021-10-05 17:38:23 -07:00
Maytas Monsereenusorn	f60b3b3bab	fix doc (#11772 )	2021-10-05 15:42:11 -07:00
Victoria Lim	a31d99fb37	update docs with X-Druid-SQL-Query-Id (#11761 ) * update docs with X-Druid-SQL-Query-Id * review comments * update header description * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-10-06 00:15:05 +07:00
Caroline1000	ffbe303828	Update balancer strategy recommendations (#11759 ) * Update balancer strategy recommendations * Update docs/configuration/index.md * Update docs/configuration/index.md Co-authored-by: Suneet Saldanha <suneet@apache.org>	2021-10-05 09:47:37 -07:00
Vaibhav	3c4bba1478	Update kinesis-ingestion.md (#11767 ) * Update kinesis-ingestion.md It seems that we are declaring (a final int) recordsPerFetch as 400 and fetchDelayMillis as 0 in https://github.com/implydata/druid/blob/imply-2021.09/extensions-core/kinesis-indexing-service/src/main/java/org/apache/druid/indexing/kinesis/KinesisIndexTaskIOConfig.java#L36 ``` public static final int DEFAULT_RECORDS_PER_FETCH = 4000; public static final int DEFAULT_FETCH_DELAY_MILLIS = 0; ``` updating `recordsPerFetch` and `fetchDelayMillis` to actual default values as hardcoded above . * Update docs/development/extensions-core/kinesis-ingestion.md Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-10-04 11:26:53 -07:00
sthetland	d02d2d9d56	Design/architecture doc touchups (#11762 ) * rearrange design content * casing consistency Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-10-04 11:09:35 -07:00
Maytas Monsereenusorn	129911a20e	Add documentations for config to filter internal Druid-related messages from error response (#11755 ) * add doc * add doc * address comments * fix typo * address comments	2021-10-01 17:49:02 +07:00
Kashif Faraz	c641657bae	Fix router documentation for `druid.router.sql.enable` (#11716 ) * Rename field, fix router documentation * Add more lines to doc * Apply doc suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-09-28 22:54:13 +05:30
Clint Wylie	5de26cf6d9	add optional system schema authorization (#11720 ) * add optional system schema authorization * remove unused * adjust docs * doc fixes, missing ldap config change for integration tests * style	2021-09-21 13:28:26 -07:00
Lucas Capistrant	5c3f3da146	Add handoff wait time to IngestionStatsAndErrorsTaskReportData (#11090 ) * Add handoff wait time to ingestion stats report. Refactor some code for batch handoff * fix checkstyle * Add assertion to AbstractITBatchIndexTask to make sure report reflects wait for segments happened * add docs to the task reports section of doc	2021-09-20 22:48:44 -07:00
Peter Marshall	abd19a8896	Docs - SYS query examples (#11673 ) * Update sql.md Added two example queries and adjusted formatting of one that was already there * Update docs/querying/sql.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update docs/querying/sql.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update docs/querying/sql.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update docs/querying/sql.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update sql.md Co-authored-by: Frank Chen <frankchen@apache.org>	2021-09-17 08:27:34 -07:00
Clint Wylie	5e092ccb9b	add MV_FILTER_ONLY, MV_FILTER_NONE, ListFilteredVirtualColumn (#11650 ) * add MV_FILTER_ONLY SQL function, and list filter virtual column * MV_FILTER_NONE and more tests * formatting * o yeah, forgot can do easy thing * style * hmm why was that there * test filtering on virtual column * style * meh * do it right * good bot	2021-09-16 09:31:53 -07:00
Charles Smith	1ae1bbfc4f	docs: delete / cancel query (#11708 ) * draft delete query * Update docs/querying/sql.md Co-authored-by: Jihoon Son <jihoonson@apache.org> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * address comments * Update docs/querying/sql.md Co-authored-by: Jihoon Son <jihoonson@apache.org> * Update docs/querying/sql.md Co-authored-by: Jihoon Son <jihoonson@apache.org> * Update sql.md fix port for router * Update sql.md remove authorization until it is 403 * Update sql.md add 403 message Co-authored-by: Jihoon Son <jihoonson@apache.org> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2021-09-15 20:26:04 -07:00
Peter Marshall	ee009ec18e	Docs - ingestion task log config and process (#11678 ) * Update index.md Moved H4s underneath the H3 for the task log location and added hyperlinks. * Update tasks.md Added process information around log file generation, and subsumed text from the configuration guide into this explanatory text instead. * Update tasks.md .html > .md * Update docs/ingestion/tasks.md Co-authored-by: Frank Chen <frankchen@apache.org> Co-authored-by: Frank Chen <frankchen@apache.org>	2021-09-13 15:49:09 -07:00
Charles Smith	f9329fbf9e	add clarification for maxSubqueryRows (#11687 ) * add clarification for maxSubqueryRows	2021-09-13 11:49:30 -07:00
Suneet Saldanha	531d11abaf	Update description of batchProcessingMode (#11686 ) * Update description of batchProcessingMode Update the description to explicitly mention a released version of Druid that the original version was referencing * Update docs/configuration/index.md * Update docs/configuration/index.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-09-10 16:55:48 -07:00
Peter Marshall	f16cd2a815	Docs - granularities link back to segmentGranularity (#11672 ) * Update granularities.md Link-back to the ingestion spec as well as Native queries plus examples. * Update docs/querying/granularities.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/granularities.md Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-09-10 10:40:11 -07:00
Agustin Gonzalez	9efa6cc9c8	Make persists concurrent with adding rows in batch ingestion (#11536 ) * Make persists concurrent with ingestion * Remove semaphore but keep concurrent persists (with add) and add push in the backround as well * Go back to documented default persists (zero) * Move to debug * Remove unnecessary Atomics * Comments on synchronization (or not) for sinks & sinkMetadata * Some cleanup for unit tests but they still need further work * Shutdown & wait for persists and push on close * Provide support for three existing batch appenderators using batchProcessingMode flag * Fix reference to wrong appenderator * Fix doc typos * Add BatchAppenderators class test coverage * Add log message to batchProcessingMode final value, fix typo in enum name * Another typo and minor fix to log message * LEGACY->OPEN_SEGMENTS, Edit docs * Minor update legacy->open segments log message * More code comments, mostly small adjustments to naming etc * fix spelling * Exclude BtachAppenderators from Jacoco since it is fully tested but Jacoco still refuses to ack coverage * Coverage for Appenderators & BatchAppenderators, name change of a method that was still using "legacy" rather than "openSegments" Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2021-09-08 13:31:52 -07:00
Jihoon Son	7e90d00cc0	Configurable maxStreamLength for doubles sketches (#11574 ) * Configurable maxStreamLength for doubles sketches * fix equals/hashcode and it test failure * fix test * fix it test * benchmark * doc * grouping key * fix comment * dependency check * Update docs/development/extensions-core/datasketches-quantiles.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-08-31 14:56:37 -07:00
zhangyue19921010	6d14ea2d14	Dynamic auto scale Kinesis-Stream ingest tasks (#10985 ) * ready to test * revert misc.xml * document kinesis md * Update docs/development/extensions-core/kafka-ingestion.md * Update docs/development/extensions-core/kinesis-ingestion.md * Update docs/development/extensions-core/kinesis-ingestion.md * Update docs/development/extensions-core/kinesis-ingestion.md * Update docs/development/extensions-core/kinesis-ingestion.md * Update docs/development/extensions-core/kinesis-ingestion.md * Update docs/development/extensions-core/kinesis-ingestion.md * Update docs/development/extensions-core/kinesis-ingestion.md * Update docs/development/extensions-core/kinesis-ingestion.md * Update docs/development/extensions-core/kinesis-ingestion.md * Update docs/development/extensions-core/kinesis-ingestion.md * Update kafka-ingestion.md remove leading ` * Update kinesis-ingestion.md add missing ` Co-authored-by: yuezhang <yuezhang@freewheel.tv> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-08-30 15:44:29 -07:00
Peter Marshall	e1d80d05a2	Docs - note when partitioning using concatenated dimensions (#11506 ) LGTM * Update native-batch.md Knowledge from https://the-asf.slack.com/archives/CJ8D1JTB8/p1595434977062400 * Update native-batch.md * Fixed broken link + some grammar * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> * Update native-batch.md Some grammatical wizardry. * Update native-batch.md * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Apply suggestions from code review remove orphaned links Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-08-30 11:59:24 -07:00
Gian Merlino	ec6c6e2d53	Docs: Clarify segmentMetadata cardinality, minmax, and size behavior. (#11549 ) * Docs: Clarify segmentMetadata cardinality, minmax, and size behavior. * Further clarifications. * Update docs/querying/segmentmetadataquery.md style update Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-08-26 15:39:40 -07:00
Charles Smith	9032a0b079	updates Kafka and Kinesis to use . Fixes some typos and other style i… (#11624 ) * updates Kafka and Kinesis to use . Fixes some typos and other style issues for Kafka. * fix spelling * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: Jihoon Son <jihoonson@apache.org> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: Jihoon Son <jihoonson@apache.org> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: Jihoon Son <jihoonson@apache.org> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: Jihoon Son <jihoonson@apache.org> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: Jihoon Son <jihoonson@apache.org> * Update docs/development/extensions-core/kinesis-ingestion.md Co-authored-by: Jihoon Son <jihoonson@apache.org> * Update docs/development/extensions-core/kinesis-ingestion.md Co-authored-by: Jihoon Son <jihoonson@apache.org> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: Jihoon Son <jihoonson@apache.org> * address comments Co-authored-by: Jihoon Son <jihoonson@apache.org>	2021-08-26 13:22:30 -07:00
Paul Rogers	1d5438ae7c	Add details to the Docker tutorial (#11463 ) * Add details to the Docker tutorial Added links, explanations and other details to the Docker tutorial to make it easier for first-time users. * Fix spelling error And add "Jupyter" to the spelling dictionary. * Update docs/tutorials/docker.md * Update docs/tutorials/docker.md Co-authored-by: sthetland <steve.hetland@imply.io> * Update docs/tutorials/docker.md Co-authored-by: sthetland <steve.hetland@imply.io> * Update docs/tutorials/docker.md * Update docs/tutorials/docker.md Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: sthetland <steve.hetland@imply.io>	2021-08-24 08:49:29 -07:00
Jeet Patel	adb2f5c884	Add prometheus-emitter docs (#11618 ) * Add prometheus-emitter docs * Update docs/development/extensions.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-08-24 08:48:03 -07:00
Daegi Kim	59e560e24d	fix for numShards description (#11611 ) Co-authored-by: devin-kim <devin.kim@kakaocorp.com>	2021-08-23 14:05:03 -07:00
Charles Smith	66964a261b	fixes syntax for TRIM (#11619 ) * fixes syntax for TRIM * trim erroneous quote * fix typo	2021-08-23 11:44:19 -07:00
Clint Wylie	ec334a641b	MySQL extension with MariaDB connector docs (#11608 ) * add docs for mariadb support via mysql extensions * add logging so you know what druid knows * homogenize * spelling * missed a couple	2021-08-19 01:52:26 -07:00
Maytas Monsereenusorn	ce4dd48bb8	Support custom coordinator duties (#11601 ) * impl * fix checkstyle * fix checkstyle * fix checkstyle * add test * add test * add test * add integration tests * add integration tests * add more docs * address comments * address comments * address comments * add test * fix checkstyle * fix test	2021-08-19 11:54:11 +07:00
Charles Smith	91cd573472	fixes web console introduction and addresses linking issues (#11609 ) * fixes web console introduction and addresses linking issues * fix merge conflict	2021-08-18 08:37:05 -07:00
Arvin.Z	504e54402b	update default compression format for bitmap (#11610 ) Co-authored-by: azheng <azheng@adobe.com>	2021-08-18 14:54:27 +05:30
Karan Kumar	d1bad92880	Made the instructions of adding extra resources as part of extensions simpler (#11577 )	2021-08-17 17:33:55 +05:30
imply-jhan	332e68edb5	improve the metric definition (#11602 )	2021-08-17 12:31:42 +07:00
Gian Merlino	4e5f9cdacf	Add pushes to DataSketches in SQL docs. (#11578 ) * Add pushes to DataSketches in SQL docs. These notices were already in the native docs, but they were missing from the SQL docs. * Grammar fix.	2021-08-16 10:38:56 -07:00
Peter Marshall	8aaefb91e3	Docs - MiddleManager Affinity "strong" definition (#11480 ) * Affinity "strong" definition Reworded "strong" to emphasise meaning and consequences - OTBO https://the-asf.slack.com/archives/CJ8D1JTB8/p1609558156092800 * Spelling corrections * Update docs/configuration/index.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/configuration/index.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-08-13 19:17:16 -07:00
sthetland	95c5bc3a6d	Clarify when changes to credentialIterations take effect (#11590 ) This change updates doc to clarify when and how a change to druid.auth.authenticator.basic.credentialIterations takes effect: changes apply only to new users or existing users upon changing their password via the credentials API, which may not be the expectation.	2021-08-13 17:02:07 -07:00
Parag Jain	c7b46671b3	option to use deep storage for storing shuffle data (#11507 ) Fixes #11297. Description Description and design in the proposal #11297 Key changed/added classes in this PR DataSegmentPusher ShuffleClient PartitionStat PartitionLocation *IntermediaryDataManager	2021-08-13 16:40:25 -04:00
frank chen	e40be0ae28	Add SQL functions to format numbers into human readable format (#10635 ) * add binary_byte_format/decimal_byte_format/decimal_format * clean code * fix doc * fix review comments * add spelling check rules * remove extra param * improve type handling and null handling * remove extra zeros * fix tests and add space between unit suffix and number as most size-format functions do * fix tests * add examples * change function names according to review comments * fix merge Signed-off-by: frank chen <frank.chen021@outlook.com> * no need to configure NullHandling explicitly for tests Signed-off-by: frank chen <frank.chen021@outlook.com> * fix tests in SQL-Compatible mode Signed-off-by: frank chen <frank.chen021@outlook.com> * Resolve review comments * Update SQL test case to check null handling * Fix intellij inspections * Add more examples * Fix example	2021-08-13 10:27:49 -07:00
Charles Smith	6524d838d7	Docs refactor of ingestion. Carries #11541 (#11576 ) * Docs refactor of ingestion. Carries #11541 * Update docs/misc/math-expr.md * add Apache license * fix header, add topics to sidebar * Update docs/ingestion/partitioning.md * pick up changes to and md from `c7fdf1d`, #11479 Co-authored-by: Suneet Saldanha <suneet@apache.org> Co-authored-by: Jihoon Son <jihoonson@apache.org>	2021-08-13 08:42:03 -07:00
Kashif Faraz	aaf0aaad8f	Enable routing of SQL queries at Router (#11566 ) This PR adds a new property druid.router.sql.enable which allows the Router to handle SQL queries when set to true. This change does not affect Avatica JDBC requests and they are still routed by hashing the Connection ID. To allow parsing of the request object as a SqlQuery (contained in module druid-sql), some classes have been moved from druid-server to druid-services with the same package name.	2021-08-13 18:44:39 +05:30
Gian Merlino	faebefecae	Docs: add pointers from api-reference to sql docs. (#11548 )	2021-08-11 09:00:33 -07:00
Suneet Saldanha	640f63094a	fix little typo (#11573 ) * fix little typo * Update docs/misc/math-expr.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-08-10 21:43:01 -07:00
Clint Wylie	9af7ba9d2a	STRING_AGG SQL aggregator function (#11241 ) * add string_agg * oops * style and fix test * spelling * fixup * review stuffs	2021-08-10 13:47:09 -07:00
benkrug	bef6f43e3d	Update math-expr.md (#11254 ) * Update math-expr.md	2021-08-09 17:46:05 -07:00
frank chen	bf5d829b71	Add more guidelines on the use of aliyun-oss-extensions (#11420 ) * Add more description Signed-off-by: frank chen <frank.chen021@outlook.com> * Update prefixes usage and Add troubleshooting section * Add endpoint configuration recommendation * Fix link * resolve review comments	2021-08-09 17:27:35 -07:00
Charles Smith	941c5ffb05	clarify JVM tmp dir requires execute on files (#11542 ) * clarify JVM tmp dir requires execute on files * code SysMonitor for spellcheck	2021-08-09 17:25:10 -07:00
Paul Rogers	3e7cba738f	Minor edits to architecture page to improve flow (#11465 ) * Minor edits to architecture page to improve flow * Fixed spelling issue	2021-08-09 07:48:29 -07:00
Yi Yuan	59c8430d29	change document (#11545 ) Co-authored-by: yuanyi <yuanyi@freewheel.tv>	2021-08-06 07:57:12 -07:00
Peter Marshall	60e3955adb	Docs - clarify datasource API sources (#11489 ) * Update api-reference.md Added note OTBO Druid slack * Update api-reference.md Changed to an alternative explanation * Update api-reference.md Oops fixed. * Update docs/operations/api-reference.md Co-authored-by: Suneet Saldanha <suneet@apache.org> * Update docs/operations/api-reference.md Co-authored-by: Suneet Saldanha <suneet@apache.org> Co-authored-by: Suneet Saldanha <suneet@apache.org>	2021-08-05 11:29:33 -07:00
Suneet Saldanha	e423e99997	Update default maxSegmentsInNodeLoadingQueue (#11540 ) * Update default maxSegmentsInNodeLoadingQueue Update the default maxSegmentsInNodeLoadingQueue from 0 (unbounded) to 100. An unbounded maxSegmentsInNodeLoadingQueue can cause cluster instability. Since this is the default druid operators need to run into this instability and then look through the docs to see that the recommended value for a large cluster is 1000. This change makes it so the default will prevent clusters from falling over as they grow over time. * update tests * codestyle	2021-08-05 11:26:58 -07:00
Maytas Monsereenusorn	3257913737	Improve query error logging (#11519 ) * Improve query error logging * add docs * address comments * address comments	2021-08-05 22:51:09 +07:00
Yi Yuan	23d7d71ea5	Add Environment Variable DynamicConfigProvider (#11377 ) * add_environment_variable_DynamicConfigProvider * fix code * code fixed * code fixed * add document * fix doc * fix doc * add more unit test * fix style * fix document * bug fixed * fix unit test * fix comment * fix test Co-authored-by: yuanyi <yuanyi@freewheel.tv>	2021-08-04 20:26:58 -07:00
Yi Yuan	aa7cb50f24	Add DynamicConfigProvider for Schema Registry (#11362 ) * add_DynamicConfigProvider_for_schema_registry * bug fixed * add document * fix document * fix spot bug * fix document * inject ObjectMapper * add DynamicConfigProviderUtils * add UT * bug fixed Co-authored-by: yuanyi <yuanyi@freewheel.tv>	2021-08-03 13:24:52 -07:00
frank chen	55a01a030a	Clarify that Broker caching for groupBy v2 queries does not work (#11370 ) * Add a note * Update docs/configuration/index.md Co-authored-by: sthetland <steve.hetland@imply.io> * clarify that both of non-result level cache and result level cache are not supported Co-authored-by: sthetland <steve.hetland@imply.io>	2021-08-03 10:01:15 -07:00
Yi Yuan	f1e52ab356	add doc (#11531 ) Co-authored-by: yuanyi <yuanyi@freewheel.tv>	2021-08-03 12:20:29 +08:00
Victoria Lim	949484728f	docs fix for doubleMean description (#11513 ) * fix for doubleMean description * include quantile aggregator description from Suneet * update hyperlink to quantiles aggregator	2021-07-30 12:39:44 -07:00
Harini Rajendran	995d99d9e4	add ingest/notices/queueSize metric to give visibility into supervisor notices queue size (#11417 )	2021-07-30 07:59:26 -07:00
Yuanli Han	b83742179a	Reduce method invocation of reservoir sampling (#11257 ) * reduce method invocation of reservoir sampling * add a dynamic parameter and add benchmark * rebase	2021-07-30 22:09:50 +08:00
Jonathan Wei	9b250c54aa	Allow kill task to mark segments as unused (#11501 ) * Allow kill task to mark segments as unused * Add IndexerSQLMetadataStorageCoordinator test * Update docs/ingestion/data-management.md Co-authored-by: Jihoon Son <jihoonson@apache.org> * Add warning to kill task doc Co-authored-by: Jihoon Son <jihoonson@apache.org>	2021-07-29 10:48:43 -05:00
Peter Marshall	0de1837ff7	Docs - partitioning note re: skew / dim concatenation + nav update (#11488 ) * Update native-batch.md Knowledge from https://the-asf.slack.com/archives/CJ8D1JTB8/p1595434977062400 * Update native-batch.md * Fixed broken link + some grammar	2021-07-27 09:17:01 -07:00
Kashif Faraz	8a4e27f51d	Select broker based on query context parameter `brokerService` (#11495 ) This change allows the selection of a specific broker service (or broker tier) by the Router. The newly added ManualTieredBrokerSelectorStrategy works as follows: Check for the parameter brokerService in the query context. If this is a valid broker service, use it. Check if the field defaultManualBrokerService has been set in the strategy. If this is a valid broker service, use it. Move on to the next strategy	2021-07-27 20:56:05 +05:30
Peter Marshall	60fdf7a734	Rollup measurement query amended (#11479 ) By user request from https://groups.google.com/g/druid-user/c/bFkOtE-1eQg - gives the measure as a floating point instead of an integer.	2021-07-27 06:29:29 -07:00
Maytas Monsereenusorn	c068906fca	Make intermediate store for shuffle tasks an extension point (#11492 ) * add interface * add docs * fix errors * fix injection * fix injection * update javadoc	2021-07-27 11:29:43 +07:00
Peter Marshall	973e5bf7d0	Docs - HLL lgK tip and slight layout change (#11482 ) * HLL lgK and a tip Knowledge transfer from https://the-asf.slack.com/archives/CJ8D1JTB8/p1600699967024200. Attempted to make a connection between the SQL HLL function and the HLL underneath without getting too complicated. Also added a note about using K over 16 being pretty much pointless. * Corrected spelling * Create datasketches-hll.md Put roll-up back to rollup * Update docs/development/extensions-core/datasketches-hll.md Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2021-07-26 12:28:53 -07:00
Lucas Capistrant	9767b42e85	Add a new metric query/segments/count that is not emitted by default (#11394 ) * Add a new metric query/segments/count that is not emitted by default * docs * test the default implementation of the metric * fix spelling error in docs * document the fact that query retries will result in additional metric emissions * update using recommended text from @jihoonson	2021-07-22 17:57:35 -07:00
benkrug	167c45260c	Update druid-vs-kudu.md (#11470 ) small typo - "need" to "needed"	2021-07-21 22:58:14 +08:00
Maytas Monsereenusorn	6ce3b6ca2d	Improve documentation for druid.indexer.autoscale.workerCapacityHint config (#11444 ) * fix doc * address comments * Update docs/configuration/index.md Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>	2021-07-21 12:48:56 +07:00
Paul Rogers	aa8c615ac2	Updates to source and doc build pages (#11464 ) * Updates to source and doc build pages. Clarifies a few points for newbies. * Fixed spelling error And added spellcheck info to website README file.	2021-07-20 18:07:34 -07:00
Abhishek Agarwal	94c1671eaf	Split SegmentLoader into SegmentLoader and SegmentCacheManager (#11466 ) This PR splits current SegmentLoader into SegmentLoader and SegmentCacheManager. SegmentLoader - this class is responsible for building the segment object but does not expose any methods for downloading, cache space management, etc. Default implementation delegates the download operations to SegmentCacheManager and only contains the logic for building segments once downloaded. . This class will be used in SegmentManager to construct Segment objects. SegmentCacheManager - this class manages the segment cache on the local disk. It fetches the segment files to the local disk, can clean up the cache, and in the future, support reserve and release on cache space. [See https://github.com/Make SegmentLoader extensible and customizable #11398]. This class will be used in ingestion tasks such as compaction, re-indexing where segment files need to be downloaded locally.	2021-07-21 00:14:19 +05:30
jerryleooo	c7fdf1d685	Fix typo in ingestion spec sample (#11433 ) * Update index.md Fix typo in the ingestion spec sample * fixed more typos	2021-07-19 22:02:21 -07:00
sthetland	a366753ba5	Consolidate multi-value dimension doc and highlight configurability (#11428 ) * Clarify options for multi-value dims * Add first example	2021-07-15 10:19:10 -07:00
Maytas Monsereenusorn	8d7d60d18e	Improve Auto scaler pendingTaskBased provisioning strategy to handle when there are no currently running worker node better (#11440 ) * fix pendingTaskBased * fix doc * address comments * address comments * address comments * address comments * address comments * address comments * address comments	2021-07-15 06:52:25 +07:00
Maytas Monsereenusorn	05d5dd9289	compaction/status API retains status for datasources that no longer existed causing in-memory used to grow unbounded (#11426 ) * compaction/status API retains status for datasources that no longer existed causing in-memory used to grow unbounded * compaction/status API retains status for datasources that no longer existed causing in-memory used to grow unbounded * compaction/status API retains status for datasources that no longer existed causing in-memory used to grow unbounded * fix test * fix test	2021-07-13 09:48:06 +07:00
Agustin Gonzalez	7e61042794	Bound memory utilization for dynamic partitioning (i.e. memory growth is constant) (#11294 ) * Bound memory in native batch ingest create segments * Move BatchAppenderatorDriverTest to indexing service... note that we had to put the sink back in sinks in mergeandpush since the persistent data needs to be dropped and the sink is required for that * Remove sinks from memory and clean up intermediate persists dirs manually after sink has been merged * Changed name from RealtimeAppenderator to StreamAppenderator * Style * Incorporating tests from StreamAppenderatorTest * Keep totalRows and cleanup code * Added missing dep * Fix unit test * Checkstyle * allowIncrementalPersists should always be true for batch * Added sinks metadata * clear sinks metadata when closing appenderator * Style + minor edits to log msgs * Update sinks metadata & totalRows when dropping a sink (segment) * Remove max * Intelli-j check * Keep a count of hydrants persisted by sink for sanity check before merge * Move out sanity * Add previous hydrant count to sink metadata * Remove redundant field from SinkMetadata * Remove unneeded functions * Cleanup unused code * Removed unused code * Remove unused field * Exclude it from jacoco because it is very hard to get branch coverage * Remove segment announcement and some other minor cleanup * Add fallback flag * Minor code cleanup * Checkstyle * Code review changes * Update batchMemoryMappedIndex name * Code review comments * Exclude class from coverage, will include again when packaging gets fixed * Moved test classes to server module * More BatchAppenderator cleanup * Fix bug in wrong counting of totalHydrants plus minor cleanup in add * Removed left over comments * Have BatchAppenderator follow the Appenderator contract for push & getSegments * Fix LGTM violations * Review comments * Add stats after push is done * Code review comments (cleanup, remove rest of synchronization constructs in batch appenderator, reneame feature flag, remove real time flag stuff from stream appenderator, etc.) * Update javadocs * Add thread safety notice to BatchAppenderator * Further cleanup config * More config cleanup	2021-07-09 00:10:29 -07:00
Joseph Glanville	d5e8d4d680	Avro union support (#10505 ) * Avro union support * Document new union support * Add support for AvroStreamInputFormat and fix checkstyle * Extend multi-member union test schema and format * Some additional docs and add Enums to spelling * Rename explodeUnions -> extractUnions * explode -> extract * ByType * Correct spelling error	2021-07-06 22:05:41 -07:00
Clint Wylie	17efa6f556	add single input string expression dimension vector selector and better expression planning (#11213 ) * add single input string expression dimension vector selector and better expression planning * better * fixes * oops * rework how vector processor factories choose string processors, fix to be less aggressive about vectorizing * oops * javadocs, renaming * more javadocs * benchmarks * use string expression vector processor with vector size 1 instead of expr.eval * better logging * javadocs, surprising number of the the * more * simplify	2021-07-06 11:20:49 -07:00
frank chen	906a704c55	Eliminate ambiguities of KB/MB/GB in the doc (#11333 ) * GB ---> GiB * suppress spelling check * MB --> MiB, KB --> KiB * Use IEC binary prefix * Add reference link * Fix doc style	2021-06-30 13:42:45 -07:00
Clint Wylie	df9b57aa1a	bitwise aggregators, better null handling options for expression agg (#11280 ) * bitwise aggregators, better nulls for expression agg * correct behavior * rework deserialize, better names * fix json, share mask	2021-06-25 16:51:16 -07:00
sthetland	fd0931d35e	Azure data lake input source (#11153 ) * Mention Azure Data Lake * Make consistent with other entries Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>	2021-06-25 15:54:34 -07:00
Hoseung Lee	ed0a57e106	Update kafka-ingestion.md to clarify PasswordProvider support limitation (#11374 ) Co-authored-by: Clint Wylie <cjwylie@gmail.com> Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2021-06-24 21:54:48 -07:00
Yi Yuan	de8daf8139	Delete buildV9Directly in Kafka and Kinesis Indexing Service (#11351 ) * delete_buildV9Directly_in_kafka_and_kinesis_indexing_service * delete * delete them from server * delete buildV9Directly from hadoop indexing * bug fixed Co-authored-by: yuanyi <yuanyi@freewheel.tv>	2021-06-23 16:36:46 -07:00
Clint Wylie	bfbd7ec432	fix a bugs related to SQL type inference return type nullability (#11327 ) * fix a bunch of type inference nullability bugs * fixes * style * fix test * fix concat	2021-06-15 12:26:59 -07:00
Charles Smith	a1ed3a407d	clarify bySegment is native only (#11331 )	2021-06-11 13:48:17 -07:00
Yi Yuan	8de0d36c52	Allow query through router when load moving average extension (#11276 ) * init commit * change NoopQuerySegmentWalker name * change doc * move NoopQuerySegmentWalker and add document * fix doc Co-authored-by: yuanyi <yuanyi@freewheel.tv>	2021-06-10 18:46:53 +08:00
Egor Riashin	9047fa3d9c	S3 ingestion can assume role (#10995 ) * feature s3 assume role * feature s3 assume role * feature s3 assume role * feature s3 assume role * feature s3 assume role * feature s3 assume role * tests fix * spelling fix * sts fix Co-authored-by: egor-ryashin <egor.ryashin@rilldata.com>	2021-06-09 16:02:35 +05:30
Yi Yuan	145cf9e5c3	fix document about input format (#11342 ) Co-authored-by: yuanyi <yuanyi@freewheel.tv>	2021-06-08 23:44:54 +08:00
frank chen	2ee7e31e5b	Fix syntax error (#11332 )	2021-06-07 22:35:02 -07:00
frank chen	d5139c9543	Fix permission problems in docker (#11299 ) * Create /opt/data to fix permission problem * eliminate symlink to avoid compatibility problem on AWS Fargate * Add a workaround section * Update instruction for named volume * Use named volume in docker-compose * Revert some doc change * Resolve review comments	2021-06-01 17:33:27 -07:00
frank chen	e664bfd433	Improve doc of movingAverage (#11262 ) * Make doc more directive Signed-off-by: frank chen <frank.chen021@outlook.com> * Add limitation Signed-off-by: frank chen <frank.chen021@outlook.com> * Suppress spelling check error	2021-05-28 13:10:55 +08:00
frank chen	60843bd11f	Add configuration suggestion to `druid.indexer.storage.type` (#11304 )	2021-05-27 06:44:47 -07:00
Xavier Léauté	b517c3339b	remove ZooKeeper 3.4 support + pass tests with Java 15 (#11073 ) With this change, Druid will only support ZooKeeper 3.5.x and later. In order to support Java 15 we need to switch to ZK 3.5.x client libraries and drop support for ZK 3.4.x (see #10780 for the detailed reasons) * remove ZooKeeper 3.4.x compatibility * exclude additional ZK 3.5.x netty dependencies to ensure we use our version * keep ZooKeeper version used for integration tests in sync with client library version * remove the need to specify ZK version at runtime for docker * add support to run integration tests with JDK 15 * build and run unit tests with Java 15 in travis	2021-05-25 12:49:49 -07:00
Agustin Gonzalez	4ba5738ffb	Add an issues section to deal with common issues when building druid (#11271 )	2021-05-21 09:04:51 -07:00
Charles Smith	403dcf5cfb	fixes some typos, edits for style (#11258 )	2021-05-21 08:58:39 -07:00
Charles Smith	fcb4eaa3d4	add docs for high-churn datasource cleanup (#11245 ) * add docs for high-churn datasource cleanup * fix most comments except for task log * address comments * update strategy recommendation * address addtional comments * fix * address comments * address comments from @sthetland	2021-05-20 09:48:42 -07:00
Clint Wylie	3649c608d2	array handling improvements (#11233 ) * fix jdbc array handling, split handling for some array and multi value operator, split and add more tests * formatting	2021-05-13 18:50:32 -07:00
Maytas Monsereenusorn	3455352241	Add feature to automatically remove compaction configurations for inactive datasources (#11232 ) * add auto cleanup * add auto cleanup * add auto cleanup * add tests * add tests * use retryutils * use retryutils * use retryutils * address comments	2021-05-11 18:49:18 -07:00
Agustin Gonzalez	8e5048e643	Avoid memory mapping hydrants after they are persisted & after they are merged for native batch ingestion (#11123 ) * Avoid mapping hydrants in create segments phase for native ingestion * Drop queriable indices after a given sink is fully merged * Do not drop memory mappings for realtime ingestion * Style fixes * Renamed to match use case better * Rollback memoization code and use the real time flag instead * Null ptr fix in FireHydrant toString plus adjustments to memory pressure tracking calculations * Style * Log some count stats * Make sure sinks size is obtained at the right time * BatchAppenderator unit test * Fix comment typos * Renamed methods to make them more readable * Move persisted metadata from FireHydrant class to AppenderatorImpl. Removed superfluous differences and fix comment typo. Removed custom comparator * Missing dependency * Make persisted hydrant metadata map concurrent and better reflect the fact that keys are Java references. Maintain persisted metadata when dropping/closing segments. * Replaced concurrent variables with normal ones * Added batchMemoryMappedIndex "fallback" flag with default "false". Set this to "true" make code fallback to previous code path. * Style fix. * Added note to new setting in doc, using Iterables.size (and removing a dependency), and fixing a typo in a comment. * Forgot to commit this edited documentation message	2021-05-11 14:34:26 -07:00
Maytas Monsereenusorn	4326e699bd	Add feature to automatically remove datasource metadata based on retention period (#11227 ) * add auto clean up datasource metadata * add test * fix checkstyle * add comments * fix error * address comments * Address comments * fix test * fix test * fix typo * add comment * fix test * fix test	2021-05-11 01:22:33 -07:00

... 5 6 7 8 9 ...

2947 Commits