Commit Graph

2566 Commits

Author SHA1 Message Date
Gian Merlino 2912a36a20
Use nonzero default value of maxQueuedBytes. (#12840)
* Use nonzero default value of maxQueuedBytes.

The purpose of this parameter is to prevent the Broker from running out
of memory. The prior default is unlimited; this patch changes it to a
relatively conservative 25MB.

This may be too low for larger clusters. The risk is that throughput
can decrease for queries with large resultsets or large amounts of intermediate
data. However, I think this is better than the risk of the prior default, which
is that these queries can cause the Broker to go OOM.

* Alter calculation.
2022-08-02 17:57:27 -07:00
317brian 553ff47616
fix: fix broken link to Class TTest (#12836) 2022-07-31 10:18:14 +08:00
Charles Smith efbb58e90e
docs: remove maxRowsPerSegment where appropriate (#12071)
* remove maxRowsPerSegment where appropriate

* fix tutorial, accept suggestions

* Update docs/design/coordinator.md

* additional tutorial file

* fix initial index spec

* accept comments

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* add back comment on maxrows per segment

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* rm duplicate entry

* Update native-batch-simple-task.md

remove ref to `maxrowspersegment`

* Update native-batch.md

remove ref to `maxrowspersegment`

* final tenticles

* Apply suggestions from code review

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2022-07-28 16:52:13 +05:30
Atul Mohan 93a9a4b1c5
Add retention for file request logs (#12559)
* Add retention for file request logs

* Spelling
2022-07-27 08:17:02 -07:00
Charles Smith d7d4314367
remove ref to plywood repo (#12809) 2022-07-26 10:12:13 +08:00
Victoria Lim 6394ecfd21
update figure and reference (#12813) 2022-07-22 15:54:25 -07:00
Katya Macedo a2be685824
Remove the time bit, fix headings (#12808)
* Remove the time bit, fix headings

* Adopt review suggestions

* Edits

* Update smoosh file description

* Adopt review suggestions

* Update spelling
2022-07-20 15:37:57 -07:00
Katya Macedo 809bf161ce
Add a note about setting the value of maxNumConcurrentSubTasks (#12772)
* Add clarification for combining input source

* Update inputFormat note

* Update maxNumConcurrentSubTasks note

* Fix broken link

* Update docs/ingestion/native-batch-input-source.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2022-07-19 15:34:21 -07:00
Atul Mohan 75045970cd
S3 Ingestion from non-default endpoints (#11798)
* Add endpoint support for s3inputsource

* Changes to tests

* Fix docs

* Fix config

* Fix inspections

* Fix spelling

* Remove password from toString
2022-07-15 11:03:34 -07:00
Jianhuan Liu d4403c15aa
Upgrade prometheus version, add more labels to PrometheusEmitter (#12769)
Changes:
- Upgrade prometheus to version 0.16.0
- Add optional labels `druid_service` and `host_name` to `PrometheusEmitter`
2022-07-15 14:43:12 +05:30
Frank Chen a544aff761
Document missed simple granularities (#12768)
* Document missed simple granularities

* Update docs/querying/granularities.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/querying/granularities.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2022-07-14 14:02:28 +08:00
zachjsh c0380e7b0a
* fix duplicate dimension (#12778) 2022-07-14 10:39:03 +05:30
Victoria Lim d8f8c56f94
Docs: Index page with all SQL functions (#12771)
* list of all functions

* add function names to spelling file
2022-07-14 09:59:55 +08:00
TSFenwick 8c02880d5f
Emit metrics for distribution of number of rows per segment (#12730)
* initial commit of bucket dimensions for metrics

return counts of segments that have rowcount in a bucket size for a datasource
return average value of rowcount per segment in a datasource
added unit test
naming could use a lot of work
buckets right now are not finalized
added javadocs
altered metrics.md

* fix checkstyle issues

* addressed review comments

add monitor test
move added functionality to new monitor
update docs

* address comments

renamed monitor
handle tombstones better
update docs
added javadocs

* Add support for tombstones in the segment distribution

* undo changes to tombstone segmentizer factory

* fix accidental whitespacing changes

* address comments regarding metrics documentation

and rename variable to be more accurate

* fix tests

* fix checkstyle issues

* fix broken test

* undo removal of timeout
2022-07-12 07:04:42 -07:00
Gian Merlino 97207cdcc7
Automatic sizing for GroupBy dictionaries. (#12763)
* Automatic sizing for GroupBy dictionary sizes.

Merging and selector dictionary sizes currently both default to 100MB.
This is not optimal, because it can lead to OOM on small servers and
insufficient resource utilization on larger servers. It also invites
end users to try to tune it when queries run out of dictionary space,
which can make things worse if the end user sets it to too high.

So, this patch:

- Adds automatic tuning for selector and merge dictionaries. Selectors
  use up to 15% of the heap and merge buffers use up to 30% of the heap
  (aggregate across all queries).

- Updates out-of-memory error messages to emphasize enabling disk
  spilling vs. increasing memory parameters. With the memory parameters
  automatically sized, it is more likely that an end user will get
  benefit from enabling disk spilling.

- Removes the query context parameters that allow lowering of configured
  dictionary sizes. These complicate the calculation, and I don't see a
  reasonable use case for them.

* Adjust tests.

* Review adjustments.

* Additional comment.

* Remove unused import.
2022-07-11 08:20:50 -07:00
Jill Osborne 682ea7f32d
IMPLY-12348: Update description of UNION ALL in SQL syntax doc (#12710)
* IMPLY-12348: Updated description of UNION ALL

* Update docs/querying/sql.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/querying/sql.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update sql.md

* Update docs/querying/sql.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2022-07-05 13:08:01 -07:00
Rui Chen 068bea6334
deps: upgrade mysql-connector-java to v5.1.49 (#12704) 2022-06-29 23:15:46 +08:00
Didip Kerabat 6ddb828c7a
Able to filter Cloud objects with glob notation. (#12659)
In a heterogeneous environment, sometimes you don't have control over the input folder. Upstream can put any folder they want. In this situation the S3InputSource.java is unusable.

Most people like me solved it by using Airflow to fetch the full list of parquet files and pass it over to Druid. But doing this explodes the JSON spec. We had a situation where 1 of the JSON spec is 16MB and that's simply too much for Overlord.

This patch allows users to pass {"filter": "*.parquet"} and let Druid performs the filtering of the input files.

I am using the glob notation to be consistent with the LocalFirehose syntax.
2022-06-24 11:40:08 +05:30
Gian Merlino d29343cbe3
Disable autokill of segments by default. (#12693)
Also add clarifying commentary to the documentation about how durationToRetain works.
2022-06-23 17:17:11 -07:00
Tejaswini Bandlamudi 99e1b4efee
Update default value of `inputSegmentSizeBytes` in configuration docs (#12678) 2022-06-22 09:05:03 +05:30
Gian Merlino 0099940808
Add TIME_IN_INTERVAL SQL operator. (#12662)
* Add TIME_IN_INTERVAL SQL operator.

The operator is implemented as a convertlet rather than an
OperatorConversion, because this allows it to be equivalent to using
the >= and < operators directly.

* SqlParserPos cannot be null here.

* Remove unused import.

* Doc updates.

* Add words to dictionary.
2022-06-21 13:05:37 -07:00
Jill Osborne f050069767
Segments doc update (#12344)
* Corrected heading levels in segments doc

* IMPLY-18394: Updated Segments doc

* Update docs/design/segments.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/design/segments.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/design/segments.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/design/segments.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/design/segments.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/design/segments.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update segments.md

* Updated links to changed headings in Segments doc

* Corrected spelling error

* Update segments.md

Incorporated suggestions from Paul Rogers.

* Update index.md

* Update segments.md

* Update segments.md

* Update segments.md

* Update compaction.md

* Update docs/design/segments.md

fix typo

* Update docs/ingestion/compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2022-06-16 13:25:17 -07:00
Victoria Lim 94564b6ce6
Update screenshots for Druid console doc (#12593)
* druid console doc updates

* remove extra image

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* updated screenshot labels

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2022-06-15 16:42:20 -07:00
Victoria Lim 353475bd36
Docs for automatic compaction (#12569)
* docs for auto-compaction

* fix broken links

* another link

* Apply suggestions from code review

Co-authored-by: Suneet Saldanha <suneet@apache.org>

* Apply suggestions from code review

Co-authored-by: Suneet Saldanha <suneet@apache.org>

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Suneet Saldanha <suneet@apache.org>

* reorg content for skipOffset

* Update docs/ingestion/automatic-compaction.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Apply suggestions from code review

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

Co-authored-by: Suneet Saldanha <suneet@apache.org>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2022-06-09 14:55:12 -07:00
Gian Merlino a503683a4a
Add caching and CSP response headers. (#12609)
* Add caching and CSP response headers.

* Fix tests.

* Fix checkstyle issues

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2022-06-04 21:46:49 +05:30
Victoria Lim 1506b26ce4
fix typo (#12607) 2022-06-04 13:14:18 +08:00
Gian Merlino a27f4f5740
Service stdout log files, move logs to log/. (#12570)
* Service stdout log files, move logs to log/.

Two changes that make log behavior cleaner:

1) Redirect messages from the Java runtime to their own log files.
   Otherwise, they would get jumbled up in the output of the all-in-one
   start command.

2) Use log/ instead of bin/log/ for the default log directory. Makes them
   easier to find.

Additionally, add documentation about how to avoid the reflective
access warnings in Java 11.

* Spelling.

* See if code formatting affects spelling.
2022-06-03 10:44:29 +05:30
Jill Osborne 9c8e6bb000
Addition to Multitenancy considerations doc (#12567)
* Small addition to Multitenancy considerations doc

* Update docs/querying/multitenancy.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update multitenancy.md

Edit suggested by @kfaraz

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2022-06-02 10:32:14 -07:00
Dr. Sizzles 7291c92f4f
Adding zstandard compression library (#12408)
* Adding zstandard compression library

* 1. Took @clintropolis's advice to have ZStandard decompressor use the byte array when the buffers are not direct.
2. Cleaned up checkstyle issues.

* Fixing zstandard version to latest stable version in pom's and updating license files

* Removing zstd from benchmarks and adding to processing (poms)

* fix the intellij inspection issue

* Removing the prefix v for the version in the license check for ztsd

* Fixing license checks

Co-authored-by: Rahul Gidwani <r_gidwani@apple.com>
2022-05-28 17:01:44 -07:00
Agustin Gonzalez 2f3d7a4c07
Emit state of replace and append for native batch tasks (#12488)
* Emit state of replace and append for native batch tasks

* Emit count of one depending on batch ingestion mode (APPEND, OVERWRITE, REPLACE)

* Add metric to compaction job

* Avoid null ptr exc when null emitter

* Coverage

* Emit tombstone & segment counts

* Tasks need a type

* Spelling

* Integrate BatchIngestionMode in batch ingestion tasks functionality

* Typos

* Remove batch ingestion type from metric since it is already in a dimension. Move IngestionMode to AbstractTask to facilitate having mode as a dimension. Add metrics to streaming. Add missing coverage.

* Avoid inner class referenced by sub-class inspection. Refactor computation of IngestionMode to make it more robust to null IOConfig and fix test.

* Spelling

* Avoid polluting the Task interface

* Rename computeCompaction methods to avoid ambiguous java compiler error if they are passed null. Other minor cleanup.
2022-05-23 12:32:47 -07:00
Gian Merlino 37853f8de4
ConcurrentGrouper: Add mergeThreadLocal option, fix bug around the switch to spilling. (#12513)
* ConcurrentGrouper: Add option to always slice up merge buffers thread-locally.

Normally, the ConcurrentGrouper shares merge buffers across processing
threads until spilling starts, and then switches to a thread-local model.
This minimizes memory use and reduces likelihood of spilling, which is
good, but it creates thread contention. The new mergeThreadLocal option
causes a query to start in thread-local mode immediately, and allows us
to experiment with the relative performance of the two modes.

* Fix grammar in docs.

* Fix race in ConcurrentGrouper.

* Fix issue with timeouts.

* Remove unused import.

* Add "tradeoff" to dictionary.
2022-05-21 10:28:54 -07:00
Katya Macedo 5073cee73f
Fix zookeeper spelling (#12556) 2022-05-21 16:14:02 +08:00
Gian Merlino 65a1375b67
SQL: Add is_active to sys.segments, update examples and docs. (#11550)
* SQL: Add is_active to sys.segments, update examples and docs.

is_active is short for:

  (is_published = 1 AND is_overshadowed = 0) OR is_realtime = 1

It's important because this represents "all the segments that should
be queryable, whether or not they actually are right now". Most of the
time, this is the set of segments that people will want to look at.

The web console already adds this filter to a lot of its queries,
proving its usefulness.

This patch also reworks the caveat at the bottom of the sys.segments
section, so its information is mixed into the description of each result
field. This should make it more likely for people to see the information.

* Wording updates.

* Adjustments for spellcheck.

* Adjust IT.
2022-05-19 14:23:28 -07:00
Charles Smith 3e8d7a6d9f
Sql docs items (#12530)
* touch up sql refactor

* brush up SQL refactor

* incorporate feedback

* reorder sql

* Update docs/querying/sql.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2022-05-17 16:56:31 -07:00
Katya Macedo 177638f171
Fix typo, add comma (#12529) 2022-05-17 16:42:47 -07:00
Gian Merlino fdfecfd996
Improved docs for range partitioning. (#12350)
* Improved docs for range partitioning.

1) Clarify the benefits of range partitioning.
2) Clarify which filters support pruning.
3) Include the fact that multi-value dimensions cannot be used for partitioning.

* Additional clarification.

* Update other section.

* Another adjustment.

* Updates from review.
2022-05-16 09:42:31 -07:00
Hellmar Becker 985640f103
Clarify the use of the Lookup API (#12088)
* Update lookups.md

* Update docs/querying/lookups.md

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>

* Update docs/querying/lookups.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2022-05-16 07:50:24 -07:00
317brian 351e57bdb6
docs(fix): clarify how worker.version and minWorkerVersion comparison works (#12459)
* docs(fix): clarify how worker.version and minWorkerVersion comparison works

* Revert "docs(fix): clarify how worker.version and minWorkerVersion comparison works"

This reverts commit cadd1fdc60.

* docs(fix): clarify how worker.version and minWorkerVersion comparison works

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/configuration/index.md

fix spelling

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2022-05-16 07:48:33 -07:00
Gian Merlino 5b6727f319
Enable vectorized virtual column processing by default. (#12520)
In the majority of cases, this improves performance.

There's only one case I'm aware of where this may be a net negative: for time_floor(__time, <period>) where there are many repeated __time values. In nonvectorized processing, SingleLongInputCachingExpressionColumnValueSelector implements an optimization to avoid computing the time_floor function on every row. There is no such optimization in vectorized processing.

IMO, we shouldn't mention this in the docs. Rationale: It's too fiddly of a thing: it's not guaranteed that nonvectorized processing will be faster due to the optimization, because it would have to overcome the inherent speed advantage of vectorization. So it'd always require testing to determine the best setting for a specific dataset. It would be bad if users disabled vectorization thinking it would speed up their queries, and it actually slowed them down. And even if users do their own testing, at some point in the future we'll implement the optimization for vectorized processing too, and it's likely that users that explicitly disabled vectorization will continue to have it disabled. I'd like to avoid this outcome by encouraging all users to enable vectorization at all times. Really advanced users would be following development activity anyway, and can read this issue
2022-05-16 15:43:53 +05:30
Frank Chen c33ff1c745
Enforce console logging for peon process (#12067)
Currently all Druid processes share the same log4j2 configuration file located in _common directory. Since peon processes are spawned by middle manager process, they derivate the environment variables from the middle manager. These variables include those in the log4j2.xml controlling to which file the logger writes the log.

But current task logging mechanism requires the peon processes to output the log to console so that the middle manager can redirect the console output to a file and upload this file to task log storage.

So, this PR imposes this requirement to peon processes, whatever the configuration is in the shared log4j2.xml, peon processes always write the log to console.
2022-05-16 15:07:21 +05:30
Gian Merlino ff253fd8a3
Add setProcessingThreadNames context parameter. (#12514)
setting thread names takes a measurable amount of time in the case where segment scans are very quick. In high-QPS testing we found a slight performance boost from turning off processing thread renaming. This option makes that possible.
2022-05-16 13:42:00 +05:30
Lucas Capistrant deb69d1bc0
Allow coordinator to be configured to kill segments in future (#10877)
Allow a Druid cluster to kill segments whose interval_end is a date in the future. This can be done by setting druid.coordinator.kill.durationToRetain to a negative period. For example PT-24H would allow segments to be killed if their interval_end date was 24 hours or less into the future at the time that the kill task is generated by the system.

A cluster operator can also disregard the druid.coordinator.kill.durationToRetain entirely by setting a new configuration, druid.coordinator.kill.ignoreDurationToRetain=true. This ignores interval_end date when looking for segments to kill, and instead is capable of killing any segment marked unused. This new configuration is off by default, and a cluster operator should fully understand and accept the risks if they enable it.
2022-05-11 07:35:15 +05:30
Kashif Faraz 60b4fa0f75
Docs: Fix column name in ingestion rollup doc (#12036)
Fix the referred column name from "count" to "num_rows" as "count" vs. "COUNT(*)" might be a little confusing in this example.
2022-05-10 17:35:59 +05:30
Rohan Garg 75836a5a06
Add feature flag for sql planning of TimeBoundary queries (#12491)
* Add feature flag for sql planning of TimeBoundary queries

* fixup! Add feature flag for sql planning of TimeBoundary queries

* Add documentation for enableTimeBoundaryPlanning

* fixup! Add documentation for enableTimeBoundaryPlanning
2022-05-10 15:23:42 +05:30
Rohan Garg 2dd073c2cd
Pass metrics object for Scan, Timeseries and GroupBy queries during cursor creation (#12484)
* Pass metrics object for Scan, Timeseries and GroupBy queries during cursor creation

* fixup! Pass metrics object for Scan, Timeseries and GroupBy queries during cursor creation

* Document vectorized dimension
2022-05-09 10:40:17 -07:00
Victoria Lim 0206a2da5c
Update automatic compaction docs with consistent terminology (#12416)
* specify automatic compaction where applicable

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* update for style and consistency

* implement suggested feedback

* remove duplicate example

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/compaction.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/operations/api-reference.md

* update .spelling

* Adopt review suggestions

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
2022-05-03 16:22:25 -07:00
Rocky Chen 770ad95169
Add a metric for task duration in the pending queue (#12492)
This PR is to measure how long a task stays in the pending queue and emits the value with the metric task/pending/time. The metric is measured in RemoteTaskRunner and HttpRemoteTaskRunner.

An example of the metric:

```
2022-04-26T21:59:09,488 INFO [rtr-pending-tasks-runner-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {"feed":"metrics","timestamp":"2022-04-26T21:59:09.487Z","service":"druid/coordinator","host":"localhost:8081","version":"2022.02.0-iap-SNAPSHOT","metric":"task/pending/time","value":8,"dataSource":"wikipedia","taskId":"index_parallel_wikipedia_gecpcglg_2022-04-26T21:59:09.432Z","taskType":"index_parallel"}
```

------------------------------------------
Key changed/added classes in this PR

    Emit metric task/pending/time in classes RemoteTaskRunner and HttpRemoteTaskRunner.
    Update related factory classes and tests.
2022-05-02 23:47:25 -04:00
317brian b97f273d5a
docs: fix typo (#12494) 2022-05-01 22:44:31 +08:00
Charles Smith 42fa5c26e1
remove arbitrary granularity spec from docs (#12460)
* remove arbitrary granularity spec from docs

* Update docs/ingestion/ingestion-spec.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2022-04-28 16:36:54 -07:00
Gian Merlino a2bad0b3a2
Reduce allocations due to Jackson serialization. (#12468)
* Reduce allocations due to Jackson serialization.

This patch attacks two sources of allocations during Jackson
serialization:

1) ObjectMapper.writeValue and JsonGenerator.writeObject create a new
   DefaultSerializerProvider instance for each call. It has lots of
   fields and creates pressure on the garbage collector. So, this patch
   adds helper functions in JacksonUtils that enable reuse of
   SerializerProvider objects and updates various call sites to make
   use of this.

2) GroupByQueryToolChest copies the ObjectMapper for every query to
   install a special module that supports backwards compatibility with
   map-based rows. This isn't needed if resultAsArray is set and
   all servers are running Druid 0.16.0 or later. This release was a
   while ago. So, this patch disables backwards compatibility by default,
   which eliminates the need to copy the heavyweight ObjectMapper. The
   patch also introduces a configuration option that allows admins to
   explicitly enable backwards compatibility.

* Add test.

* Update additional call sites and add to forbidden APIs.
2022-04-27 14:17:26 -07:00