druid

Commit Graph

Author	SHA1	Message	Date
Victoria Lim	353475bd36	Docs for automatic compaction (#12569 ) * docs for auto-compaction * fix broken links * another link * Apply suggestions from code review Co-authored-by: Suneet Saldanha <suneet@apache.org> * Apply suggestions from code review Co-authored-by: Suneet Saldanha <suneet@apache.org> * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Suneet Saldanha <suneet@apache.org> * reorg content for skipOffset * Update docs/ingestion/automatic-compaction.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Apply suggestions from code review Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> Co-authored-by: Suneet Saldanha <suneet@apache.org> Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2022-06-09 14:55:12 -07:00
Gian Merlino	a503683a4a	Add caching and CSP response headers. (#12609 ) * Add caching and CSP response headers. * Fix tests. * Fix checkstyle issues Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2022-06-04 21:46:49 +05:30
Victoria Lim	1506b26ce4	fix typo (#12607 )	2022-06-04 13:14:18 +08:00
Gian Merlino	a27f4f5740	Service stdout log files, move logs to log/. (#12570 ) * Service stdout log files, move logs to log/. Two changes that make log behavior cleaner: 1) Redirect messages from the Java runtime to their own log files. Otherwise, they would get jumbled up in the output of the all-in-one start command. 2) Use log/ instead of bin/log/ for the default log directory. Makes them easier to find. Additionally, add documentation about how to avoid the reflective access warnings in Java 11. * Spelling. * See if code formatting affects spelling.	2022-06-03 10:44:29 +05:30
Jill Osborne	9c8e6bb000	Addition to Multitenancy considerations doc (#12567 ) * Small addition to Multitenancy considerations doc * Update docs/querying/multitenancy.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update multitenancy.md Edit suggested by @kfaraz Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-06-02 10:32:14 -07:00
Dr. Sizzles	7291c92f4f	Adding zstandard compression library (#12408 ) * Adding zstandard compression library * 1. Took @clintropolis's advice to have ZStandard decompressor use the byte array when the buffers are not direct. 2. Cleaned up checkstyle issues. * Fixing zstandard version to latest stable version in pom's and updating license files * Removing zstd from benchmarks and adding to processing (poms) * fix the intellij inspection issue * Removing the prefix v for the version in the license check for ztsd * Fixing license checks Co-authored-by: Rahul Gidwani <r_gidwani@apple.com>	2022-05-28 17:01:44 -07:00
Agustin Gonzalez	2f3d7a4c07	Emit state of replace and append for native batch tasks (#12488 ) * Emit state of replace and append for native batch tasks * Emit count of one depending on batch ingestion mode (APPEND, OVERWRITE, REPLACE) * Add metric to compaction job * Avoid null ptr exc when null emitter * Coverage * Emit tombstone & segment counts * Tasks need a type * Spelling * Integrate BatchIngestionMode in batch ingestion tasks functionality * Typos * Remove batch ingestion type from metric since it is already in a dimension. Move IngestionMode to AbstractTask to facilitate having mode as a dimension. Add metrics to streaming. Add missing coverage. * Avoid inner class referenced by sub-class inspection. Refactor computation of IngestionMode to make it more robust to null IOConfig and fix test. * Spelling * Avoid polluting the Task interface * Rename computeCompaction methods to avoid ambiguous java compiler error if they are passed null. Other minor cleanup.	2022-05-23 12:32:47 -07:00
Gian Merlino	37853f8de4	ConcurrentGrouper: Add mergeThreadLocal option, fix bug around the switch to spilling. (#12513 ) * ConcurrentGrouper: Add option to always slice up merge buffers thread-locally. Normally, the ConcurrentGrouper shares merge buffers across processing threads until spilling starts, and then switches to a thread-local model. This minimizes memory use and reduces likelihood of spilling, which is good, but it creates thread contention. The new mergeThreadLocal option causes a query to start in thread-local mode immediately, and allows us to experiment with the relative performance of the two modes. * Fix grammar in docs. * Fix race in ConcurrentGrouper. * Fix issue with timeouts. * Remove unused import. * Add "tradeoff" to dictionary.	2022-05-21 10:28:54 -07:00
Katya Macedo	5073cee73f	Fix zookeeper spelling (#12556 )	2022-05-21 16:14:02 +08:00
Gian Merlino	65a1375b67	SQL: Add is_active to sys.segments, update examples and docs. (#11550 ) * SQL: Add is_active to sys.segments, update examples and docs. is_active is short for: (is_published = 1 AND is_overshadowed = 0) OR is_realtime = 1 It's important because this represents "all the segments that should be queryable, whether or not they actually are right now". Most of the time, this is the set of segments that people will want to look at. The web console already adds this filter to a lot of its queries, proving its usefulness. This patch also reworks the caveat at the bottom of the sys.segments section, so its information is mixed into the description of each result field. This should make it more likely for people to see the information. * Wording updates. * Adjustments for spellcheck. * Adjust IT.	2022-05-19 14:23:28 -07:00
Charles Smith	3e8d7a6d9f	Sql docs items (#12530 ) * touch up sql refactor * brush up SQL refactor * incorporate feedback * reorder sql * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-05-17 16:56:31 -07:00
Katya Macedo	177638f171	Fix typo, add comma (#12529 )	2022-05-17 16:42:47 -07:00
Gian Merlino	fdfecfd996	Improved docs for range partitioning. (#12350 ) * Improved docs for range partitioning. 1) Clarify the benefits of range partitioning. 2) Clarify which filters support pruning. 3) Include the fact that multi-value dimensions cannot be used for partitioning. * Additional clarification. * Update other section. * Another adjustment. * Updates from review.	2022-05-16 09:42:31 -07:00
Hellmar Becker	985640f103	Clarify the use of the Lookup API (#12088 ) * Update lookups.md * Update docs/querying/lookups.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/querying/lookups.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2022-05-16 07:50:24 -07:00
317brian	351e57bdb6	docs(fix): clarify how worker.version and minWorkerVersion comparison works (#12459 ) * docs(fix): clarify how worker.version and minWorkerVersion comparison works * Revert "docs(fix): clarify how worker.version and minWorkerVersion comparison works" This reverts commit `cadd1fdc60`. * docs(fix): clarify how worker.version and minWorkerVersion comparison works * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/configuration/index.md fix spelling Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-05-16 07:48:33 -07:00
Gian Merlino	5b6727f319	Enable vectorized virtual column processing by default. (#12520 ) In the majority of cases, this improves performance. There's only one case I'm aware of where this may be a net negative: for time_floor(__time, <period>) where there are many repeated __time values. In nonvectorized processing, SingleLongInputCachingExpressionColumnValueSelector implements an optimization to avoid computing the time_floor function on every row. There is no such optimization in vectorized processing. IMO, we shouldn't mention this in the docs. Rationale: It's too fiddly of a thing: it's not guaranteed that nonvectorized processing will be faster due to the optimization, because it would have to overcome the inherent speed advantage of vectorization. So it'd always require testing to determine the best setting for a specific dataset. It would be bad if users disabled vectorization thinking it would speed up their queries, and it actually slowed them down. And even if users do their own testing, at some point in the future we'll implement the optimization for vectorized processing too, and it's likely that users that explicitly disabled vectorization will continue to have it disabled. I'd like to avoid this outcome by encouraging all users to enable vectorization at all times. Really advanced users would be following development activity anyway, and can read this issue	2022-05-16 15:43:53 +05:30
Frank Chen	c33ff1c745	Enforce console logging for peon process (#12067 ) Currently all Druid processes share the same log4j2 configuration file located in _common directory. Since peon processes are spawned by middle manager process, they derivate the environment variables from the middle manager. These variables include those in the log4j2.xml controlling to which file the logger writes the log. But current task logging mechanism requires the peon processes to output the log to console so that the middle manager can redirect the console output to a file and upload this file to task log storage. So, this PR imposes this requirement to peon processes, whatever the configuration is in the shared log4j2.xml, peon processes always write the log to console.	2022-05-16 15:07:21 +05:30
Gian Merlino	ff253fd8a3	Add setProcessingThreadNames context parameter. (#12514 ) setting thread names takes a measurable amount of time in the case where segment scans are very quick. In high-QPS testing we found a slight performance boost from turning off processing thread renaming. This option makes that possible.	2022-05-16 13:42:00 +05:30
Lucas Capistrant	deb69d1bc0	Allow coordinator to be configured to kill segments in future (#10877 ) Allow a Druid cluster to kill segments whose interval_end is a date in the future. This can be done by setting druid.coordinator.kill.durationToRetain to a negative period. For example PT-24H would allow segments to be killed if their interval_end date was 24 hours or less into the future at the time that the kill task is generated by the system. A cluster operator can also disregard the druid.coordinator.kill.durationToRetain entirely by setting a new configuration, druid.coordinator.kill.ignoreDurationToRetain=true. This ignores interval_end date when looking for segments to kill, and instead is capable of killing any segment marked unused. This new configuration is off by default, and a cluster operator should fully understand and accept the risks if they enable it.	2022-05-11 07:35:15 +05:30
Kashif Faraz	60b4fa0f75	Docs: Fix column name in ingestion rollup doc (#12036 ) Fix the referred column name from "count" to "num_rows" as "count" vs. "COUNT(*)" might be a little confusing in this example.	2022-05-10 17:35:59 +05:30
Rohan Garg	75836a5a06	Add feature flag for sql planning of TimeBoundary queries (#12491 ) * Add feature flag for sql planning of TimeBoundary queries * fixup! Add feature flag for sql planning of TimeBoundary queries * Add documentation for enableTimeBoundaryPlanning * fixup! Add documentation for enableTimeBoundaryPlanning	2022-05-10 15:23:42 +05:30
Rohan Garg	2dd073c2cd	Pass metrics object for Scan, Timeseries and GroupBy queries during cursor creation (#12484 ) * Pass metrics object for Scan, Timeseries and GroupBy queries during cursor creation * fixup! Pass metrics object for Scan, Timeseries and GroupBy queries during cursor creation * Document vectorized dimension	2022-05-09 10:40:17 -07:00
Victoria Lim	0206a2da5c	Update automatic compaction docs with consistent terminology (#12416 ) * specify automatic compaction where applicable * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * update for style and consistency * implement suggested feedback * remove duplicate example * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/ingestion/compaction.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/operations/api-reference.md * update .spelling * Adopt review suggestions Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2022-05-03 16:22:25 -07:00
Rocky Chen	770ad95169	Add a metric for task duration in the pending queue (#12492 ) This PR is to measure how long a task stays in the pending queue and emits the value with the metric task/pending/time. The metric is measured in RemoteTaskRunner and HttpRemoteTaskRunner. An example of the metric: ``` 2022-04-26T21:59:09,488 INFO [rtr-pending-tasks-runner-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {"feed":"metrics","timestamp":"2022-04-26T21:59:09.487Z","service":"druid/coordinator","host":"localhost:8081","version":"2022.02.0-iap-SNAPSHOT","metric":"task/pending/time","value":8,"dataSource":"wikipedia","taskId":"index_parallel_wikipedia_gecpcglg_2022-04-26T21:59:09.432Z","taskType":"index_parallel"} ``` ------------------------------------------ Key changed/added classes in this PR Emit metric task/pending/time in classes RemoteTaskRunner and HttpRemoteTaskRunner. Update related factory classes and tests.	2022-05-02 23:47:25 -04:00
317brian	b97f273d5a	docs: fix typo (#12494 )	2022-05-01 22:44:31 +08:00
Charles Smith	42fa5c26e1	remove arbitrary granularity spec from docs (#12460 ) * remove arbitrary granularity spec from docs * Update docs/ingestion/ingestion-spec.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-04-28 16:36:54 -07:00
Gian Merlino	a2bad0b3a2	Reduce allocations due to Jackson serialization. (#12468 ) * Reduce allocations due to Jackson serialization. This patch attacks two sources of allocations during Jackson serialization: 1) ObjectMapper.writeValue and JsonGenerator.writeObject create a new DefaultSerializerProvider instance for each call. It has lots of fields and creates pressure on the garbage collector. So, this patch adds helper functions in JacksonUtils that enable reuse of SerializerProvider objects and updates various call sites to make use of this. 2) GroupByQueryToolChest copies the ObjectMapper for every query to install a special module that supports backwards compatibility with map-based rows. This isn't needed if resultAsArray is set and all servers are running Druid 0.16.0 or later. This release was a while ago. So, this patch disables backwards compatibility by default, which eliminates the need to copy the heavyweight ObjectMapper. The patch also introduces a configuration option that allows admins to explicitly enable backwards compatibility. * Add test. * Update additional call sites and add to forbidden APIs.	2022-04-27 14:17:26 -07:00
zachjsh	564d6defd4	Worker level task metrics (#12446 ) * * fix metric name inconsistency * * add task slot metrics for middle managers * * add new WorkerTaskCountStatsMonitor to report task count metrics from worker * * more stuff * * remove unused variable * * more stuff * * add javadocs * * fix checkstyle * * fix hadoop test failure * * cleanup * * add more code coverage in tests * * fix test failure * * add docs * * increase code coverage * * fix spelling * * fix failing tests * * remove dead code * * fix spelling	2022-04-26 11:44:44 -05:00
Peter Marshall	b47316b844	Update native-batch.md (#12478 ) Fixed indent on the Granularity Spec section and removed some superfluous tabbings.	2022-04-25 21:44:17 +08:00
Apoorv Gupta	4781af9921	Fix formatting in stats.md (#12470 ) * Fix formatting in stats.md * Update stats.md * Update docs/development/extensions-core/stats.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update docs/development/extensions-core/stats.md Co-authored-by: Frank Chen <frankchen@apache.org> Co-authored-by: Frank Chen <frankchen@apache.org>	2022-04-23 11:35:08 +08:00
Victoria Lim	63a993c33a	stringFirst and stringLast supported in ingestion (#12466 )	2022-04-22 10:28:49 +08:00
Victoria Lim	f95447070e	updated docs for sql query context (#12406 )	2022-04-21 11:19:39 -07:00
jacobtolar	0edc22179c	Document expression post-aggregators (#11896 ) * Document expression post-aggregators * Update docs/querying/post-aggregations.md Co-authored-by: Frank Chen <frankchen@apache.org> Co-authored-by: Frank Chen <frankchen@apache.org>	2022-04-19 10:36:19 +08:00
Victoria Lim	c86c48203e	recommendation for comparing strings and numbers (#12442 )	2022-04-18 09:28:32 -07:00
Peter Marshall	5167d328b1	Docs - query caching (#11584 ) * Update caching.md Knowledge from https://the-asf.slack.com/archives/CJ8D1JTB8/p1597781107153900 Update caching.md A few additional updates OTBO https://the-asf.slack.com/archives/CJ8D1JTB8/p1608669046041300 * Update caching.md Typos * Amendments on the segment cache Significant updates on content around the segment cache, pull process, and in-memory cache * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/operations/basic-cluster-tuning.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/operations/basic-cluster-tuning.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update basic-cluster-tuning.md typo * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Whole-query caching update Made more succinct and removed specific config to change. * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-04-18 17:00:21 +08:00
Charles Smith	408b46ae9f	Fixes a small typo in ingestion spec doc (#12143 ) * small typo * Update docs/ingestion/ingestion-spec.md Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: sthetland <steve.hetland@imply.io>	2022-04-18 16:53:50 +08:00
Peter Marshall	1201c9b2e5	Docs - added another common config property to tuningConfig (#11935 ) * Update ingestion-spec.md Added indexSpecForIntermediatePersists as a common configuration property. * Update ingestion-spec.md Amended to remove "below" and add link to the table. * Update ingestion-spec.md Removed passive.	2022-04-18 13:41:39 +08:00
Alexandre BERTHIOT	9f2b37f250	Update tutorial-compaction.md to change an unclear statement (#11988 ) * Update tutorial-compaction.md Unclear statement on the explanation of tuningConfig section. * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2022-04-18 13:25:09 +08:00
Maytas Monsereenusorn	5d37d9f9d8	Add docs to metric spec for auto compaction (#12415 ) * add docs * Update docs/configuration/index.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update index.md * Update docs/configuration/index.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-04-13 13:27:00 -07:00
Katya Macedo	f24e9c6862	Add Kinesis ListShards permission (#12387 ) * add Kinesis permission * List Kinesis IAM permissions * Adopt review suggestions * Fix merge conflicts	2022-04-13 15:29:56 +05:30
Parag Jain	2c79d28bb7	Copy of #11309 with fixes (#12402 ) * Optionally load segment index files into page cache on bootstrap and new segment download * Fix unit test failure * Fix test case * fix spelling * fix spelling * fix test and test coverage issues Co-authored-by: Jian Wang <wjhypo@gmail.com>	2022-04-11 21:05:24 +05:30
mark-imply	bf96ddf5ba	Update index.md (#12390 ) Added guidance on when to increase druid.indexer.storage.recentlyFinishedThreshold.	2022-04-08 18:01:54 +05:30
mark-imply	d98cbd90f0	Update basic-cluster-tuning.md (#12412 ) Changed "Other useful JVM flags" to "Other generally useful JVM flags" in order to align with the introduction to the doc.	2022-04-08 15:29:55 +05:30
317brian	d82a8185d1	fix(docs): clarify what s3 permissions are needed based on the access management type (#12405 ) * fix(docs): clarify what s3 permissions are needed based on the permissions model * fix typo * Update docs/development/extensions-core/s3.md Co-authored-by: Jihoon Son <jihoonson@apache.org> Co-authored-by: Jihoon Son <jihoonson@apache.org>	2022-04-07 16:22:56 -07:00
Victoria Lim	e6229b76a6	Document data format and example for featureSpec (#12394 ) * add data format and example for featureSpec * add second feature in example * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-04-06 15:17:15 -07:00
317brian	ac6c24793e	docs(fix): add clarity around granularitySpec (#12362 ) * fix: add clarify around granularitySpec * fix spacing * Update docs/ingestion/compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-04-06 09:24:37 -07:00
Victoria Lim	d326c681c1	Document config for ingesting null columns (#12389 ) * config for ingesting null columns * add link * edit .spelling * what happens if storeEmptyColumns is disabled	2022-04-05 09:15:42 -07:00
AmatyaAvadhanula	067254b778	Package kinesis client jar within the extension (#12370 ) amazon-kinesis-client was not covered undered the apache license and required separate insertion in the kinesis extension. This can now be avoided since it is covered, and including it within druid helps prevent incompatibilities. Allows enabling of deaggregation out of the box by packaging amazon-kinesis-client (1.14.4) with druid for kinesis ingestion.	2022-04-04 21:31:18 +05:30
Tejaswini Bandlamudi	984904779b	Increase default DatasourceCompactionConfig.inputSegmentSizeBytes to Long.MAX_VALUE (#12381 ) The current default value of inputSegmentSizeBytes is 400MB, which is pretty low for most compaction use cases. Thus most users are forced to override the default. The default value is now increased to Long.MAX_VALUE.	2022-04-04 16:28:53 +05:30
AmatyaAvadhanula	c5531be553	Add feature flag for Kinesis listShards API usage (#12383 ) listShards API was used to get all the shards for kinesis ingestion to improve its resiliency as part of #12161. However, this may require additional permissions in the IAM policy where the stream is present. (Please refer to: https://docs.aws.amazon.com/kinesis/latest/APIReference/API_ListShards.html). A dynamic configuration useListShards has been added to KinesisSupervisorTuningConfig to control the usage of this API and prevent issues upon upgrade. It can be safely turned on (and is recommended when using kinesis ingestion) by setting this configuration to true.	2022-04-04 14:58:10 +05:30

1 2 3 4 5 ...

2543 Commits