Commit Graph

2575 Commits

Author SHA1 Message Date
David Palmer 2855fb6ff8
Change Kafka Lookup Extractor to not register consumer group (#12842)
* change kafka lookups module to not commit offsets

The current behaviour of the Kafka lookup extractor is to not commit
offsets by assigning a unique ID to the consumer group and setting
auto.offset.reset to earliest. This does the job but also pollutes the
Kafka broker with a bunch of "ghost" consumer groups that will never again be
used.

To fix this, we now set enable.auto.commit to false, which prevents the
ghost consumer groups being created in the first place.

* update docs to include new enable.auto.commit setting behaviour

* update kafka-lookup-extractor documentation

Provide some additional detail on functionality and configuration.
Hopefully this will make it clearer how the extractor works for
developers who aren't so familiar with Kafka.

* add comments better explaining the logic of the code

* add spelling exceptions for kafka lookup docs
2022-08-09 16:14:22 +05:30
絵空事スピリット ebe783dbdc
Correct minor format issue (#12882) 2022-08-09 18:15:41 +08:00
Hamish Ball abd7a9748d
Remove kafka lookup records when a record is tombstoned (#12819)
* remove kafka lookup records from factory when record tombstoned

* update kafka lookup docs to include tombstone behaviour

* change test wait time down to 10ms

Co-authored-by: David Palmer <david.palmer@adscale.co.nz>
2022-08-09 10:42:51 +05:30
David Hergenroeder 533c39f35a
Fix rollup docs bullet formatting (#12876) 2022-08-09 10:10:07 +08:00
Suneet Saldanha 267b32c2e2
Set druid.processing.fifo to true by default (#12571) 2022-08-08 10:18:24 -07:00
Gian Merlino 01d555e47b
Adjust "in" filter null behavior to match "selector". (#12863)
* Adjust "in" filter null behavior to match "selector".

Now, both of them match numeric nulls if constructed with a "null" value.

This is consistent as far as native execution goes, but doesn't match
the behavior of SQL = and IN. So, to address that, this patch also
updates the docs to clarify that the native filters do match nulls.

This patch also updates the SQL docs to describe how Boolean logic is
handled in addition to how NULL values are handled.

Fixes #12856.

* Fix test.
2022-08-08 09:08:36 -07:00
AmatyaAvadhanula d294404924
Kinesis ingestion with empty shards (#12792)
Kinesis ingestion requires all shards to have at least 1 record at the required position in druid.
Even if this is satisified initially, resharding the stream can lead to empty intermediate shards. A significant delay in writing to newly created shards was also problematic.

Kinesis shard sequence numbers are big integers. Introduce two more custom sequence tokens UNREAD_TRIM_HORIZON and UNREAD_LATEST to indicate that a shard has not been read from and that it needs to be read from the start or the end respectively.
These values can be used to avoid the need to read at least one record to obtain a sequence number for ingesting a newly discovered shard.

If a record cannot be obtained immediately, use a marker to obtain the relevant shardIterator and use this shardIterator to obtain a valid sequence number. As long as a valid sequence number is not obtained, continue storing the token as the offset.

These tokens (UNREAD_TRIM_HORIZON and UNREAD_LATEST) are logically ordered to be earlier than any valid sequence number.

However, the ordering requires a few subtle changes to the existing mechanism for record sequence validation:

The sequence availability check ensures that the current offset is before the earliest available sequence in the shard. However, current token being an UNREAD token indicates that any sequence number in the shard is valid (despite the ordering)

Kinesis sequence numbers are inclusive i.e if current sequence == end sequence, there are more records left to read.
However, the equality check is exclusive when dealing with UNREAD tokens.
2022-08-05 22:38:58 +05:30
Katya Macedo c6dd9dd4af
Fix typo in compaction.md (#12774) 2022-08-04 14:47:22 -07:00
Gian Merlino ef6811ef88
Improved Java 17 support and Java runtime docs. (#12839)
* Improved Java 17 support and Java runtime docs.

1) Add a "Java runtime" doc page with information about supported
   Java versions, garbage collection, and strong encapsulation..

2) Update asm and equalsverifier to versions that support Java 17.

3) Add additional "--add-opens" lines to surefire configuration, so
   tests can pass successfully under Java 17.

4) Switch openjdk15 tests to openjdk17.

5) Update FrameFile to specifically mention Java runtime incompatibility
   as the cause of not being able to use Memory.map.

6) Update SegmentLoadDropHandler to log an error for Errors too, not
   just Exceptions. This is important because an IllegalAccessError is
   encountered when the correct "--add-opens" line is not provided,
   which would otherwise be silently ignored.

7) Update example configs to use druid.indexer.runner.javaOptsArray
   instead of druid.indexer.runner.javaOpts. (The latter is deprecated.)

* Adjustments.

* Use run-java in more places.

* Add run-java.

* Update .gitignore.

* Exclude hadoop-client-api.

Brought in when building on Java 17.

* Swap one more usage of java.

* Fix the run-java script.

* Fix flag.

* Include link to Temurin.

* Spelling.

* Update examples/bin/run-java

Co-authored-by: Xavier Léauté <xl+github@xvrl.net>

Co-authored-by: Xavier Léauté <xl+github@xvrl.net>
2022-08-03 23:16:05 -07:00
Gian Merlino 2912a36a20
Use nonzero default value of maxQueuedBytes. (#12840)
* Use nonzero default value of maxQueuedBytes.

The purpose of this parameter is to prevent the Broker from running out
of memory. The prior default is unlimited; this patch changes it to a
relatively conservative 25MB.

This may be too low for larger clusters. The risk is that throughput
can decrease for queries with large resultsets or large amounts of intermediate
data. However, I think this is better than the risk of the prior default, which
is that these queries can cause the Broker to go OOM.

* Alter calculation.
2022-08-02 17:57:27 -07:00
317brian 553ff47616
fix: fix broken link to Class TTest (#12836) 2022-07-31 10:18:14 +08:00
Charles Smith efbb58e90e
docs: remove maxRowsPerSegment where appropriate (#12071)
* remove maxRowsPerSegment where appropriate

* fix tutorial, accept suggestions

* Update docs/design/coordinator.md

* additional tutorial file

* fix initial index spec

* accept comments

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* add back comment on maxrows per segment

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* rm duplicate entry

* Update native-batch-simple-task.md

remove ref to `maxrowspersegment`

* Update native-batch.md

remove ref to `maxrowspersegment`

* final tenticles

* Apply suggestions from code review

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2022-07-28 16:52:13 +05:30
Atul Mohan 93a9a4b1c5
Add retention for file request logs (#12559)
* Add retention for file request logs

* Spelling
2022-07-27 08:17:02 -07:00
Charles Smith d7d4314367
remove ref to plywood repo (#12809) 2022-07-26 10:12:13 +08:00
Victoria Lim 6394ecfd21
update figure and reference (#12813) 2022-07-22 15:54:25 -07:00
Katya Macedo a2be685824
Remove the time bit, fix headings (#12808)
* Remove the time bit, fix headings

* Adopt review suggestions

* Edits

* Update smoosh file description

* Adopt review suggestions

* Update spelling
2022-07-20 15:37:57 -07:00
Katya Macedo 809bf161ce
Add a note about setting the value of maxNumConcurrentSubTasks (#12772)
* Add clarification for combining input source

* Update inputFormat note

* Update maxNumConcurrentSubTasks note

* Fix broken link

* Update docs/ingestion/native-batch-input-source.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2022-07-19 15:34:21 -07:00
Atul Mohan 75045970cd
S3 Ingestion from non-default endpoints (#11798)
* Add endpoint support for s3inputsource

* Changes to tests

* Fix docs

* Fix config

* Fix inspections

* Fix spelling

* Remove password from toString
2022-07-15 11:03:34 -07:00
Jianhuan Liu d4403c15aa
Upgrade prometheus version, add more labels to PrometheusEmitter (#12769)
Changes:
- Upgrade prometheus to version 0.16.0
- Add optional labels `druid_service` and `host_name` to `PrometheusEmitter`
2022-07-15 14:43:12 +05:30
Frank Chen a544aff761
Document missed simple granularities (#12768)
* Document missed simple granularities

* Update docs/querying/granularities.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/querying/granularities.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2022-07-14 14:02:28 +08:00
zachjsh c0380e7b0a
* fix duplicate dimension (#12778) 2022-07-14 10:39:03 +05:30
Victoria Lim d8f8c56f94
Docs: Index page with all SQL functions (#12771)
* list of all functions

* add function names to spelling file
2022-07-14 09:59:55 +08:00
TSFenwick 8c02880d5f
Emit metrics for distribution of number of rows per segment (#12730)
* initial commit of bucket dimensions for metrics

return counts of segments that have rowcount in a bucket size for a datasource
return average value of rowcount per segment in a datasource
added unit test
naming could use a lot of work
buckets right now are not finalized
added javadocs
altered metrics.md

* fix checkstyle issues

* addressed review comments

add monitor test
move added functionality to new monitor
update docs

* address comments

renamed monitor
handle tombstones better
update docs
added javadocs

* Add support for tombstones in the segment distribution

* undo changes to tombstone segmentizer factory

* fix accidental whitespacing changes

* address comments regarding metrics documentation

and rename variable to be more accurate

* fix tests

* fix checkstyle issues

* fix broken test

* undo removal of timeout
2022-07-12 07:04:42 -07:00
Gian Merlino 97207cdcc7
Automatic sizing for GroupBy dictionaries. (#12763)
* Automatic sizing for GroupBy dictionary sizes.

Merging and selector dictionary sizes currently both default to 100MB.
This is not optimal, because it can lead to OOM on small servers and
insufficient resource utilization on larger servers. It also invites
end users to try to tune it when queries run out of dictionary space,
which can make things worse if the end user sets it to too high.

So, this patch:

- Adds automatic tuning for selector and merge dictionaries. Selectors
  use up to 15% of the heap and merge buffers use up to 30% of the heap
  (aggregate across all queries).

- Updates out-of-memory error messages to emphasize enabling disk
  spilling vs. increasing memory parameters. With the memory parameters
  automatically sized, it is more likely that an end user will get
  benefit from enabling disk spilling.

- Removes the query context parameters that allow lowering of configured
  dictionary sizes. These complicate the calculation, and I don't see a
  reasonable use case for them.

* Adjust tests.

* Review adjustments.

* Additional comment.

* Remove unused import.
2022-07-11 08:20:50 -07:00
Jill Osborne 682ea7f32d
IMPLY-12348: Update description of UNION ALL in SQL syntax doc (#12710)
* IMPLY-12348: Updated description of UNION ALL

* Update docs/querying/sql.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/querying/sql.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update sql.md

* Update docs/querying/sql.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2022-07-05 13:08:01 -07:00
Rui Chen 068bea6334
deps: upgrade mysql-connector-java to v5.1.49 (#12704) 2022-06-29 23:15:46 +08:00
Didip Kerabat 6ddb828c7a
Able to filter Cloud objects with glob notation. (#12659)
In a heterogeneous environment, sometimes you don't have control over the input folder. Upstream can put any folder they want. In this situation the S3InputSource.java is unusable.

Most people like me solved it by using Airflow to fetch the full list of parquet files and pass it over to Druid. But doing this explodes the JSON spec. We had a situation where 1 of the JSON spec is 16MB and that's simply too much for Overlord.

This patch allows users to pass {"filter": "*.parquet"} and let Druid performs the filtering of the input files.

I am using the glob notation to be consistent with the LocalFirehose syntax.
2022-06-24 11:40:08 +05:30
Gian Merlino d29343cbe3
Disable autokill of segments by default. (#12693)
Also add clarifying commentary to the documentation about how durationToRetain works.
2022-06-23 17:17:11 -07:00
Tejaswini Bandlamudi 99e1b4efee
Update default value of `inputSegmentSizeBytes` in configuration docs (#12678) 2022-06-22 09:05:03 +05:30
Gian Merlino 0099940808
Add TIME_IN_INTERVAL SQL operator. (#12662)
* Add TIME_IN_INTERVAL SQL operator.

The operator is implemented as a convertlet rather than an
OperatorConversion, because this allows it to be equivalent to using
the >= and < operators directly.

* SqlParserPos cannot be null here.

* Remove unused import.

* Doc updates.

* Add words to dictionary.
2022-06-21 13:05:37 -07:00
Jill Osborne f050069767
Segments doc update (#12344)
* Corrected heading levels in segments doc

* IMPLY-18394: Updated Segments doc

* Update docs/design/segments.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/design/segments.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/design/segments.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/design/segments.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/design/segments.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/design/segments.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update segments.md

* Updated links to changed headings in Segments doc

* Corrected spelling error

* Update segments.md

Incorporated suggestions from Paul Rogers.

* Update index.md

* Update segments.md

* Update segments.md

* Update segments.md

* Update compaction.md

* Update docs/design/segments.md

fix typo

* Update docs/ingestion/compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2022-06-16 13:25:17 -07:00
Victoria Lim 94564b6ce6
Update screenshots for Druid console doc (#12593)
* druid console doc updates

* remove extra image

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* updated screenshot labels

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2022-06-15 16:42:20 -07:00
Victoria Lim 353475bd36
Docs for automatic compaction (#12569)
* docs for auto-compaction

* fix broken links

* another link

* Apply suggestions from code review

Co-authored-by: Suneet Saldanha <suneet@apache.org>

* Apply suggestions from code review

Co-authored-by: Suneet Saldanha <suneet@apache.org>

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Suneet Saldanha <suneet@apache.org>

* reorg content for skipOffset

* Update docs/ingestion/automatic-compaction.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Apply suggestions from code review

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

Co-authored-by: Suneet Saldanha <suneet@apache.org>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2022-06-09 14:55:12 -07:00
Gian Merlino a503683a4a
Add caching and CSP response headers. (#12609)
* Add caching and CSP response headers.

* Fix tests.

* Fix checkstyle issues

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2022-06-04 21:46:49 +05:30
Victoria Lim 1506b26ce4
fix typo (#12607) 2022-06-04 13:14:18 +08:00
Gian Merlino a27f4f5740
Service stdout log files, move logs to log/. (#12570)
* Service stdout log files, move logs to log/.

Two changes that make log behavior cleaner:

1) Redirect messages from the Java runtime to their own log files.
   Otherwise, they would get jumbled up in the output of the all-in-one
   start command.

2) Use log/ instead of bin/log/ for the default log directory. Makes them
   easier to find.

Additionally, add documentation about how to avoid the reflective
access warnings in Java 11.

* Spelling.

* See if code formatting affects spelling.
2022-06-03 10:44:29 +05:30
Jill Osborne 9c8e6bb000
Addition to Multitenancy considerations doc (#12567)
* Small addition to Multitenancy considerations doc

* Update docs/querying/multitenancy.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update multitenancy.md

Edit suggested by @kfaraz

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2022-06-02 10:32:14 -07:00
Dr. Sizzles 7291c92f4f
Adding zstandard compression library (#12408)
* Adding zstandard compression library

* 1. Took @clintropolis's advice to have ZStandard decompressor use the byte array when the buffers are not direct.
2. Cleaned up checkstyle issues.

* Fixing zstandard version to latest stable version in pom's and updating license files

* Removing zstd from benchmarks and adding to processing (poms)

* fix the intellij inspection issue

* Removing the prefix v for the version in the license check for ztsd

* Fixing license checks

Co-authored-by: Rahul Gidwani <r_gidwani@apple.com>
2022-05-28 17:01:44 -07:00
Agustin Gonzalez 2f3d7a4c07
Emit state of replace and append for native batch tasks (#12488)
* Emit state of replace and append for native batch tasks

* Emit count of one depending on batch ingestion mode (APPEND, OVERWRITE, REPLACE)

* Add metric to compaction job

* Avoid null ptr exc when null emitter

* Coverage

* Emit tombstone & segment counts

* Tasks need a type

* Spelling

* Integrate BatchIngestionMode in batch ingestion tasks functionality

* Typos

* Remove batch ingestion type from metric since it is already in a dimension. Move IngestionMode to AbstractTask to facilitate having mode as a dimension. Add metrics to streaming. Add missing coverage.

* Avoid inner class referenced by sub-class inspection. Refactor computation of IngestionMode to make it more robust to null IOConfig and fix test.

* Spelling

* Avoid polluting the Task interface

* Rename computeCompaction methods to avoid ambiguous java compiler error if they are passed null. Other minor cleanup.
2022-05-23 12:32:47 -07:00
Gian Merlino 37853f8de4
ConcurrentGrouper: Add mergeThreadLocal option, fix bug around the switch to spilling. (#12513)
* ConcurrentGrouper: Add option to always slice up merge buffers thread-locally.

Normally, the ConcurrentGrouper shares merge buffers across processing
threads until spilling starts, and then switches to a thread-local model.
This minimizes memory use and reduces likelihood of spilling, which is
good, but it creates thread contention. The new mergeThreadLocal option
causes a query to start in thread-local mode immediately, and allows us
to experiment with the relative performance of the two modes.

* Fix grammar in docs.

* Fix race in ConcurrentGrouper.

* Fix issue with timeouts.

* Remove unused import.

* Add "tradeoff" to dictionary.
2022-05-21 10:28:54 -07:00
Katya Macedo 5073cee73f
Fix zookeeper spelling (#12556) 2022-05-21 16:14:02 +08:00
Gian Merlino 65a1375b67
SQL: Add is_active to sys.segments, update examples and docs. (#11550)
* SQL: Add is_active to sys.segments, update examples and docs.

is_active is short for:

  (is_published = 1 AND is_overshadowed = 0) OR is_realtime = 1

It's important because this represents "all the segments that should
be queryable, whether or not they actually are right now". Most of the
time, this is the set of segments that people will want to look at.

The web console already adds this filter to a lot of its queries,
proving its usefulness.

This patch also reworks the caveat at the bottom of the sys.segments
section, so its information is mixed into the description of each result
field. This should make it more likely for people to see the information.

* Wording updates.

* Adjustments for spellcheck.

* Adjust IT.
2022-05-19 14:23:28 -07:00
Charles Smith 3e8d7a6d9f
Sql docs items (#12530)
* touch up sql refactor

* brush up SQL refactor

* incorporate feedback

* reorder sql

* Update docs/querying/sql.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2022-05-17 16:56:31 -07:00
Katya Macedo 177638f171
Fix typo, add comma (#12529) 2022-05-17 16:42:47 -07:00
Gian Merlino fdfecfd996
Improved docs for range partitioning. (#12350)
* Improved docs for range partitioning.

1) Clarify the benefits of range partitioning.
2) Clarify which filters support pruning.
3) Include the fact that multi-value dimensions cannot be used for partitioning.

* Additional clarification.

* Update other section.

* Another adjustment.

* Updates from review.
2022-05-16 09:42:31 -07:00
Hellmar Becker 985640f103
Clarify the use of the Lookup API (#12088)
* Update lookups.md

* Update docs/querying/lookups.md

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>

* Update docs/querying/lookups.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2022-05-16 07:50:24 -07:00
317brian 351e57bdb6
docs(fix): clarify how worker.version and minWorkerVersion comparison works (#12459)
* docs(fix): clarify how worker.version and minWorkerVersion comparison works

* Revert "docs(fix): clarify how worker.version and minWorkerVersion comparison works"

This reverts commit cadd1fdc60.

* docs(fix): clarify how worker.version and minWorkerVersion comparison works

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/configuration/index.md

fix spelling

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2022-05-16 07:48:33 -07:00
Gian Merlino 5b6727f319
Enable vectorized virtual column processing by default. (#12520)
In the majority of cases, this improves performance.

There's only one case I'm aware of where this may be a net negative: for time_floor(__time, <period>) where there are many repeated __time values. In nonvectorized processing, SingleLongInputCachingExpressionColumnValueSelector implements an optimization to avoid computing the time_floor function on every row. There is no such optimization in vectorized processing.

IMO, we shouldn't mention this in the docs. Rationale: It's too fiddly of a thing: it's not guaranteed that nonvectorized processing will be faster due to the optimization, because it would have to overcome the inherent speed advantage of vectorization. So it'd always require testing to determine the best setting for a specific dataset. It would be bad if users disabled vectorization thinking it would speed up their queries, and it actually slowed them down. And even if users do their own testing, at some point in the future we'll implement the optimization for vectorized processing too, and it's likely that users that explicitly disabled vectorization will continue to have it disabled. I'd like to avoid this outcome by encouraging all users to enable vectorization at all times. Really advanced users would be following development activity anyway, and can read this issue
2022-05-16 15:43:53 +05:30
Frank Chen c33ff1c745
Enforce console logging for peon process (#12067)
Currently all Druid processes share the same log4j2 configuration file located in _common directory. Since peon processes are spawned by middle manager process, they derivate the environment variables from the middle manager. These variables include those in the log4j2.xml controlling to which file the logger writes the log.

But current task logging mechanism requires the peon processes to output the log to console so that the middle manager can redirect the console output to a file and upload this file to task log storage.

So, this PR imposes this requirement to peon processes, whatever the configuration is in the shared log4j2.xml, peon processes always write the log to console.
2022-05-16 15:07:21 +05:30
Gian Merlino ff253fd8a3
Add setProcessingThreadNames context parameter. (#12514)
setting thread names takes a measurable amount of time in the case where segment scans are very quick. In high-QPS testing we found a slight performance boost from turning off processing thread renaming. This option makes that possible.
2022-05-16 13:42:00 +05:30