* MSQ: Fix task lock checking during publish, fix lock priority.
Fixes two issues:
1) ControllerImpl did not properly check the return value of
SegmentTransactionalInsertAction when doing a REPLACE. This could cause
it to not realize that its locks were preempted.
2) Task lock priority was the default of 0. It should be the higher
batch default of 50. The low priority made it possible for MSQ tasks
to be preempted by compaction tasks, which is not desired.
* Restructuring, add docs.
* Add performSegmentPublish tests.
* Fix tests.
* Compaction: Fetch segments one at a time on main task; skip when possible.
Compact tasks include the ability to fetch existing segments and determine
reasonable defaults for granularitySpec, dimensionsSpec, and metricsSpec.
This is a useful feature that makes compact tasks work well even when the
user running the compaction does not have a clear idea of what they want
the compacted segments to be like.
However, this comes at a cost: it takes time, and disk space, to do all
of these fetches. This patch improves the situation in two ways:
1) When segments do need to be fetched, download them one at a time and
delete them when we're done. This still takes time, but minimizes the
required disk space.
2) Don't fetch segments on the main compact task when they aren't needed.
If the user provides a full granularitySpec, dimensionsSpec, and
metricsSpec, we can skip it.
* Adjustments.
* Changes from code review.
* Fix logic for determining rollup.
* MSQ: Consider PARTITION_STATS_MAX_BYTES in WorkerMemoryParameters.
This consideration is important, because otherwise we can run out of
memory due to large statistics-tracking objects.
* Improved calculations.
* Always return sketches from DS_HLL, DS_THETA, DS_QUANTILES_SKETCH.
These aggregation functions are documented as creating sketches. However,
they are planned into native aggregators that include finalization logic
to convert the sketch to a number of some sort. This creates an
inconsistency: the functions sometimes return sketches, and sometimes
return numbers, depending on where they lie in the native query plan.
This patch changes these SQL aggregators to _never_ finalize, by using
the "shouldFinalize" feature of the native aggregators. It already
existed for theta sketches. This patch adds the feature for hll and
quantiles sketches.
As to impact, Druid finalizes aggregators in two cases:
- When they appear in the outer level of a query (not a subquery).
- When they are used as input to an expression or finalizing-field-access
post-aggregator (not any other kind of post-aggregator).
With this patch, the functions will no longer be finalized in these cases.
The second item is not likely to matter much. The SQL functions all declare
return type OTHER, which would be usable as an input to any other function
that makes sense and that would be planned into an expression.
So, the main effect of this patch is the first item. To provide backwards
compatibility with anyone that was depending on the old behavior, the
patch adds a "sqlFinalizeOuterSketches" query context parameter that
restores the old behavior.
Other changes:
1) Move various argument-checking logic from runtime to planning time in
DoublesSketchListArgBaseOperatorConversion, by adding an OperandTypeChecker.
2) Add various JsonIgnores to the sketches to simplify their JSON representations.
3) Allow chaining of ExpressionPostAggregators and other PostAggregators
in the SQL layer.
4) Avoid unnecessary FieldAccessPostAggregator wrapping in the SQL layer,
now that expressions can operate on complex inputs.
5) Adjust return type to thetaSketch (instead of OTHER) in
ThetaSketchSetBaseOperatorConversion.
* Fix benchmark class.
* Fix compilation error.
* Fix ThetaSketchSqlAggregatorTest.
* Hopefully fix ITAutoCompactionTest.
* Adjustment to ITAutoCompactionTest.
* Use lookup memory footprint in MSQ memory computations.
Two main changes:
1) Add estimateHeapFootprint to LookupExtractor.
2) Use this in MSQ's IndexerWorkerContext when determining the total
amount of available memory. It's taken off the top.
This prevents MSQ tasks from running out of memory when there are lookups
defined in the cluster.
* Updates from code review.
* Support for middle manager less druid, tasks launch as k8s jobs
* Fixing forking task runner test
* Test cleanup, dependency cleanup, intellij inspections cleanup
* Changes per PR review
Add configuration option to disable http/https proxy for the k8s client
Update the docs to provide more detail about sidecar support
* Removing un-needed log lines
* Small changes per PR review
* Upon task completion we callback to the overlord to update the status / locaiton, for slower k8s clusters, this reduces locking time significantly
* Merge conflict fix
* Fixing tests and docs
* update tiny-cluster.yaml
changed `enableTaskLevelLogPush` to `encapsulatedTask`
* Apply suggestions from code review
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
* Minor changes per PR request
* Cleanup, adding test to AbstractTask
* Add comment in peon.sh
* Bumping code coverage
* More tests to make code coverage happy
* Doh a duplicate dependnecy
* Integration test setup is weird for k8s, will do this in a different PR
* Reverting back all integration test changes, will do in anotbher PR
* use StringUtils.base64 instead of Base64
* Jdk is nasty, if i compress in jdk 11 in jdk 17 the decompressed result is different
Co-authored-by: Rahul Gidwani <r_gidwani@apple.com>
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
* introduce a "tree" type to the flattenSpec
* feedback - rename exprs to nodes, use CollectionsUtils.isNullOrEmpty for guard
* feedback - expand docs to more clearly capture limitations of "tree" flattenSpec
* feedback - fix for typo on docs
* introduce a comment to explain defensive copy, tweak null handling
* fix: part of rebase
* mark ObjectFlatteners.FlattenerMaker as an ExtensionPoint and provide default for new tree type
* fix: objectflattener restore previous behavior to call getRootField for root type
* docs: ingestion/data-formats add note that ORC only supports path expressions
* chore: linter remove unused import
* fix: use correct newer form for empty DimensionsSpec in FlattenJSONBenchmark
* add FrontCodedIndexed for delta string encoding
* now for actual segments
* fix indexOf
* fixes and thread safety
* add bucket size 4, which seems generally better
* fixes
* fixes maybe
* update indexes to latest interfaces
* utf8 support
* adjust
* oops
* oops
* refactor, better, faster
* more test
* fixes
* revert
* adjustments
* fix prefixing
* more chill
* sql nested benchmark too
* refactor
* more comments and javadocs
* better get
* remove base class
* fix
* hot rod
* adjust comments
* faster still
* minor adjustments
* spatial index support
* spotbugs
* add isSorted to Indexed to strengthen indexOf contract if set, improve javadocs, add docs
* fix docs
* push into constructor
* use base buffer instead of copy
* oops
Tracking additional improvements requested by @paul-rogers: #13239
* api: refactor page so that indented bullet is child and unindented portion is parent
* get rid of post etc headings and combine them with the endpoint
* Update docs/operations/api-reference.md
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
* fix broken links
* fix typo
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
Async reads for JDBC:
Prevents JDBC timeouts on long queries by returning empty batches
when a batch fetch takes too long. Uses an async model to run the
result fetch concurrently with JDBC requests.
Fixed race condition in Druid's Avatica server-side handler
Fixed issue with no-user connections
Druid currently uses Zookeeper dependent options as the default.
This commit updates the following to use HTTP as the default instead.
- task runner. `druid.indexer.runner.type=remote -> httpRemote`
- load queue peon. `druid.coordinator.loadqueuepeon.type=curator -> http`
- server inventory view. `druid.serverview.type=curator -> http`
* add a note to the documentation about pre-built HLLSketches
Druid actually supports ingesting a pre-generated sketch column by using
the HLLSketchMerge aggregator. However, this functionality was
previously not made clear in the documentation.
* copyedit from the King's English to American English
* add suggested style changes
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* update log4j example
* fix some style issues
* Update docs/configuration/logging.md
Co-authored-by: Frank Chen <frankchen@apache.org>
Co-authored-by: Frank Chen <frankchen@apache.org>
* Clarified the behaviour of COUNT(DISTINCT column) on multi-value columns
* Update docs/querying/sql-aggregations.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Vadim Ogievetsky <vadimon@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Various documentation updates.
1) Split out "data management" from "ingestion". Break it into thematic pages.
2) Move "SQL-based ingestion" into the Ingestion category. Adjust content so
all conceptual content is in concepts.md and all syntax content is in reference.md.
Shorten the known issues page to the most interesting ones.
3) Add SQL-based ingestion to the ingestion method comparison page. Remove the
index task, since index_parallel is just as good when maxNumConcurrentSubTasks: 1.
4) Rename various mentions of "Druid console" to "web console".
5) Add additional information to ingestion/partitioning.md.
6) Remove a mention of Tranquility.
7) Remove a note about upgrading to Druid 0.10.1.
8) Remove no-longer-relevant task types from ingestion/tasks.md.
9) Move ingestion/native-batch-firehose.md to the hidden section. It was previously deprecated.
10) Move ingestion/native-batch-simple-task.md to the hidden section. It is still linked in some
places, but it isn't very useful compared to index_parallel, so it shouldn't take up space
in the sidebar.
11) Make all br tags self-closing.
12) Certain other cosmetic changes.
13) Update to node-sass 7.
* make travis use node12 for docs
Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com>
* remove things that do not apply
* fix more things
* pin node to a working version
* fix
* fixes
* known issues tidy up
* revert auto formatting changes
* remove management-uis page which is 100% lies
* don't mention the Coordinator console (that no longer exits)
* goodies
* fix typo
* prometheus-emitter supports sending metrics to pushgateway regularly and continuously
* spell check fix
* Optimization variable name and related documents
* Update docs/development/extensions-contrib/prometheus.md
OK, it looks more conspicuous
Co-authored-by: Frank Chen <frankchen@apache.org>
* Update doc
* Update docs/development/extensions-contrib/prometheus.md
Co-authored-by: Frank Chen <frankchen@apache.org>
* When PrometheusEmitter is closed, close the scheduler
* Ensure that registeredMetrics is thread safe.
* Local variable name optimization
* Remove unnecessary white space characters
Co-authored-by: Frank Chen <frankchen@apache.org>
* Update tutorial-kafka.md
Added missing command to the doc for zookeeper before starting kafka
* Update docs/tutorials/tutorial-kafka.md
Co-authored-by: Frank Chen <frankchen@apache.org>
* Add interpolation to JsonConfigurator
* Fix checkstyle
* Fix tests by removing common-text override
* Add back commons-text without version
* Remove unused hadoopDir configs
* Move some stuff to hopefully pass coverage
* more consistent expression error messages
* review stuff
* add NamedFunction for Function, ApplyFunction, and ExprMacro to share common stuff
* fixes
* add expression transform name to transformer failure, better parse_json error messaging
Co-authored-by: Clint Wylie <cjwylie@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: brian.le <brian.le@imply.io>
Compressed Big Decimal is an extension which provides support for
Mutable big decimal value that can be used to accumulate values
without losing precision or reallocating memory. This type helps in
absolute precision arithmetic on large numbers in applications,
where greater level of accuracy is required, such as financial
applications, currency based transactions. This helps avoid rounding
issues where in potentially large amount of money can be lost.
Accumulation requires that the two numbers have the same scale,
but does not require that they are of the same size. If the value
being accumulated has a larger underlying array than this value
(the result), then the higher order bits are dropped, similar to what
happens when adding a long to an int and storing the result in an
int. A compressed big decimal that holds its data with an embedded
array.
Compressed big decimal is an absolute number based complex type
based on big decimal in Java. This supports all the functionalities
supported by Java Big Decimal. Java Big Decimal is not mutable in
order to avoid big garbage collection issues. Compressed big decimal
is needed to mutate the value in the accumulator.
* apache#12063 Ease of hidding sensitive properties from /status/properties endpoint
* apache#12063 Ease of hidding sensitive properties from /status/properties endpoint
* apache#12063 Ease of hidding sensitive properties from /status/properties endpoint
using one property for hiding properties, updated the index.md to document hiddenProperties
* apache#12063 Ease of hidding sensitive properties from /status/properties endpoint
Added java docs
* apache#12063 Ease of hidding sensitive properties from /status/properties endpoint
Add "password", "key", "token", "pwd" as default druid.server.hiddenProperties
fixed typo and removed redundant space
Co-authored-by: zemin <zemin.piao@adyen.com>
Two improvements:
- Use a realistic targetRowsPerSegment, so if people copy and paste
the example from the docs, it will generate reasonable segments.
- Spell "countryName" correctly.
The docs say Java 17 support is experimental, and give tips on running
successfully with Java 17.
This patch also removes java.base/jdk.internal.perf and
jdk.management/com.sun.management.internal from the list of required
exports and opens, because they were formerly needed for JvmMonitor,
which was rewritten in #12481 to use MXBeans instead.
* json_value adjustments
changes:
* native json_value expression now has optional 3rd argument to specify type, which will cast all values to the specified type
* rework how JSON_VALUE is wired up in SQL. Now we are using a custom convertlet to translate JSON_VALUE(... RETURNING type) into dedicated JSON_VALUE_BIGINT, JSON_VALUE_DOUBLE, JSON_VALUE_VARCHAR, JSON_VALUE_ANY instead of using the calcite StandardConvertletTable that wraps JSON_VALUE_ANY in a CAST, so that we preserve the typing of JSON_VALUE to pass down to the native expression as the 3rd argument
* fix json_value_any to be usable by humans too, coverage
* fix bug
* checkstyle
* checkstyle
* review stuff
* validate that options to json_value are the supported options rather than ignore them
* remove more legacy undocumented functions
* KLL sketch
* added documentation
* direct static refs
* direct static refs
* fixed test
* addressed review points
* added KLL sketch related terms
* return a copy from get
* Copy unions when returning them from "get".
* Remove redundant "final".
Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>
Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
Added link to the relevant section of the Basic Cluster Tuning page on each process page.
This is in order to improve access to this information, which is not easy to find through search or nav.
* Introduce defaultOnDiskStorage config for groupBy
* add debug log to groupby query config
* Apply config change suggestion from review
* Remove accidental new lines
* update default value of new default disk storage config
* update debug log to have more descriptive text
* Make maxOnDiskStorage and defaultOnDiskStorage HumanRedadableBytes
* improve test coverage
* Provide default implementation to new default method on advice of reviewer
* change kafka lookups module to not commit offsets
The current behaviour of the Kafka lookup extractor is to not commit
offsets by assigning a unique ID to the consumer group and setting
auto.offset.reset to earliest. This does the job but also pollutes the
Kafka broker with a bunch of "ghost" consumer groups that will never again be
used.
To fix this, we now set enable.auto.commit to false, which prevents the
ghost consumer groups being created in the first place.
* update docs to include new enable.auto.commit setting behaviour
* update kafka-lookup-extractor documentation
Provide some additional detail on functionality and configuration.
Hopefully this will make it clearer how the extractor works for
developers who aren't so familiar with Kafka.
* add comments better explaining the logic of the code
* add spelling exceptions for kafka lookup docs
* remove kafka lookup records from factory when record tombstoned
* update kafka lookup docs to include tombstone behaviour
* change test wait time down to 10ms
Co-authored-by: David Palmer <david.palmer@adscale.co.nz>
* Adjust "in" filter null behavior to match "selector".
Now, both of them match numeric nulls if constructed with a "null" value.
This is consistent as far as native execution goes, but doesn't match
the behavior of SQL = and IN. So, to address that, this patch also
updates the docs to clarify that the native filters do match nulls.
This patch also updates the SQL docs to describe how Boolean logic is
handled in addition to how NULL values are handled.
Fixes#12856.
* Fix test.
Kinesis ingestion requires all shards to have at least 1 record at the required position in druid.
Even if this is satisified initially, resharding the stream can lead to empty intermediate shards. A significant delay in writing to newly created shards was also problematic.
Kinesis shard sequence numbers are big integers. Introduce two more custom sequence tokens UNREAD_TRIM_HORIZON and UNREAD_LATEST to indicate that a shard has not been read from and that it needs to be read from the start or the end respectively.
These values can be used to avoid the need to read at least one record to obtain a sequence number for ingesting a newly discovered shard.
If a record cannot be obtained immediately, use a marker to obtain the relevant shardIterator and use this shardIterator to obtain a valid sequence number. As long as a valid sequence number is not obtained, continue storing the token as the offset.
These tokens (UNREAD_TRIM_HORIZON and UNREAD_LATEST) are logically ordered to be earlier than any valid sequence number.
However, the ordering requires a few subtle changes to the existing mechanism for record sequence validation:
The sequence availability check ensures that the current offset is before the earliest available sequence in the shard. However, current token being an UNREAD token indicates that any sequence number in the shard is valid (despite the ordering)
Kinesis sequence numbers are inclusive i.e if current sequence == end sequence, there are more records left to read.
However, the equality check is exclusive when dealing with UNREAD tokens.
* Improved Java 17 support and Java runtime docs.
1) Add a "Java runtime" doc page with information about supported
Java versions, garbage collection, and strong encapsulation..
2) Update asm and equalsverifier to versions that support Java 17.
3) Add additional "--add-opens" lines to surefire configuration, so
tests can pass successfully under Java 17.
4) Switch openjdk15 tests to openjdk17.
5) Update FrameFile to specifically mention Java runtime incompatibility
as the cause of not being able to use Memory.map.
6) Update SegmentLoadDropHandler to log an error for Errors too, not
just Exceptions. This is important because an IllegalAccessError is
encountered when the correct "--add-opens" line is not provided,
which would otherwise be silently ignored.
7) Update example configs to use druid.indexer.runner.javaOptsArray
instead of druid.indexer.runner.javaOpts. (The latter is deprecated.)
* Adjustments.
* Use run-java in more places.
* Add run-java.
* Update .gitignore.
* Exclude hadoop-client-api.
Brought in when building on Java 17.
* Swap one more usage of java.
* Fix the run-java script.
* Fix flag.
* Include link to Temurin.
* Spelling.
* Update examples/bin/run-java
Co-authored-by: Xavier Léauté <xl+github@xvrl.net>
Co-authored-by: Xavier Léauté <xl+github@xvrl.net>
* Use nonzero default value of maxQueuedBytes.
The purpose of this parameter is to prevent the Broker from running out
of memory. The prior default is unlimited; this patch changes it to a
relatively conservative 25MB.
This may be too low for larger clusters. The risk is that throughput
can decrease for queries with large resultsets or large amounts of intermediate
data. However, I think this is better than the risk of the prior default, which
is that these queries can cause the Broker to go OOM.
* Alter calculation.
* Add clarification for combining input source
* Update inputFormat note
* Update maxNumConcurrentSubTasks note
* Fix broken link
* Update docs/ingestion/native-batch-input-source.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* initial commit of bucket dimensions for metrics
return counts of segments that have rowcount in a bucket size for a datasource
return average value of rowcount per segment in a datasource
added unit test
naming could use a lot of work
buckets right now are not finalized
added javadocs
altered metrics.md
* fix checkstyle issues
* addressed review comments
add monitor test
move added functionality to new monitor
update docs
* address comments
renamed monitor
handle tombstones better
update docs
added javadocs
* Add support for tombstones in the segment distribution
* undo changes to tombstone segmentizer factory
* fix accidental whitespacing changes
* address comments regarding metrics documentation
and rename variable to be more accurate
* fix tests
* fix checkstyle issues
* fix broken test
* undo removal of timeout
* Automatic sizing for GroupBy dictionary sizes.
Merging and selector dictionary sizes currently both default to 100MB.
This is not optimal, because it can lead to OOM on small servers and
insufficient resource utilization on larger servers. It also invites
end users to try to tune it when queries run out of dictionary space,
which can make things worse if the end user sets it to too high.
So, this patch:
- Adds automatic tuning for selector and merge dictionaries. Selectors
use up to 15% of the heap and merge buffers use up to 30% of the heap
(aggregate across all queries).
- Updates out-of-memory error messages to emphasize enabling disk
spilling vs. increasing memory parameters. With the memory parameters
automatically sized, it is more likely that an end user will get
benefit from enabling disk spilling.
- Removes the query context parameters that allow lowering of configured
dictionary sizes. These complicate the calculation, and I don't see a
reasonable use case for them.
* Adjust tests.
* Review adjustments.
* Additional comment.
* Remove unused import.
* IMPLY-12348: Updated description of UNION ALL
* Update docs/querying/sql.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/sql.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update sql.md
* Update docs/querying/sql.md
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
In a heterogeneous environment, sometimes you don't have control over the input folder. Upstream can put any folder they want. In this situation the S3InputSource.java is unusable.
Most people like me solved it by using Airflow to fetch the full list of parquet files and pass it over to Druid. But doing this explodes the JSON spec. We had a situation where 1 of the JSON spec is 16MB and that's simply too much for Overlord.
This patch allows users to pass {"filter": "*.parquet"} and let Druid performs the filtering of the input files.
I am using the glob notation to be consistent with the LocalFirehose syntax.
* Add TIME_IN_INTERVAL SQL operator.
The operator is implemented as a convertlet rather than an
OperatorConversion, because this allows it to be equivalent to using
the >= and < operators directly.
* SqlParserPos cannot be null here.
* Remove unused import.
* Doc updates.
* Add words to dictionary.
* Service stdout log files, move logs to log/.
Two changes that make log behavior cleaner:
1) Redirect messages from the Java runtime to their own log files.
Otherwise, they would get jumbled up in the output of the all-in-one
start command.
2) Use log/ instead of bin/log/ for the default log directory. Makes them
easier to find.
Additionally, add documentation about how to avoid the reflective
access warnings in Java 11.
* Spelling.
* See if code formatting affects spelling.
* Small addition to Multitenancy considerations doc
* Update docs/querying/multitenancy.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update multitenancy.md
Edit suggested by @kfaraz
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Adding zstandard compression library
* 1. Took @clintropolis's advice to have ZStandard decompressor use the byte array when the buffers are not direct.
2. Cleaned up checkstyle issues.
* Fixing zstandard version to latest stable version in pom's and updating license files
* Removing zstd from benchmarks and adding to processing (poms)
* fix the intellij inspection issue
* Removing the prefix v for the version in the license check for ztsd
* Fixing license checks
Co-authored-by: Rahul Gidwani <r_gidwani@apple.com>
* Emit state of replace and append for native batch tasks
* Emit count of one depending on batch ingestion mode (APPEND, OVERWRITE, REPLACE)
* Add metric to compaction job
* Avoid null ptr exc when null emitter
* Coverage
* Emit tombstone & segment counts
* Tasks need a type
* Spelling
* Integrate BatchIngestionMode in batch ingestion tasks functionality
* Typos
* Remove batch ingestion type from metric since it is already in a dimension. Move IngestionMode to AbstractTask to facilitate having mode as a dimension. Add metrics to streaming. Add missing coverage.
* Avoid inner class referenced by sub-class inspection. Refactor computation of IngestionMode to make it more robust to null IOConfig and fix test.
* Spelling
* Avoid polluting the Task interface
* Rename computeCompaction methods to avoid ambiguous java compiler error if they are passed null. Other minor cleanup.
* ConcurrentGrouper: Add option to always slice up merge buffers thread-locally.
Normally, the ConcurrentGrouper shares merge buffers across processing
threads until spilling starts, and then switches to a thread-local model.
This minimizes memory use and reduces likelihood of spilling, which is
good, but it creates thread contention. The new mergeThreadLocal option
causes a query to start in thread-local mode immediately, and allows us
to experiment with the relative performance of the two modes.
* Fix grammar in docs.
* Fix race in ConcurrentGrouper.
* Fix issue with timeouts.
* Remove unused import.
* Add "tradeoff" to dictionary.
* SQL: Add is_active to sys.segments, update examples and docs.
is_active is short for:
(is_published = 1 AND is_overshadowed = 0) OR is_realtime = 1
It's important because this represents "all the segments that should
be queryable, whether or not they actually are right now". Most of the
time, this is the set of segments that people will want to look at.
The web console already adds this filter to a lot of its queries,
proving its usefulness.
This patch also reworks the caveat at the bottom of the sys.segments
section, so its information is mixed into the description of each result
field. This should make it more likely for people to see the information.
* Wording updates.
* Adjustments for spellcheck.
* Adjust IT.
* Improved docs for range partitioning.
1) Clarify the benefits of range partitioning.
2) Clarify which filters support pruning.
3) Include the fact that multi-value dimensions cannot be used for partitioning.
* Additional clarification.
* Update other section.
* Another adjustment.
* Updates from review.
* docs(fix): clarify how worker.version and minWorkerVersion comparison works
* Revert "docs(fix): clarify how worker.version and minWorkerVersion comparison works"
This reverts commit cadd1fdc60.
* docs(fix): clarify how worker.version and minWorkerVersion comparison works
* Apply suggestions from code review
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/configuration/index.md
fix spelling
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
In the majority of cases, this improves performance.
There's only one case I'm aware of where this may be a net negative: for time_floor(__time, <period>) where there are many repeated __time values. In nonvectorized processing, SingleLongInputCachingExpressionColumnValueSelector implements an optimization to avoid computing the time_floor function on every row. There is no such optimization in vectorized processing.
IMO, we shouldn't mention this in the docs. Rationale: It's too fiddly of a thing: it's not guaranteed that nonvectorized processing will be faster due to the optimization, because it would have to overcome the inherent speed advantage of vectorization. So it'd always require testing to determine the best setting for a specific dataset. It would be bad if users disabled vectorization thinking it would speed up their queries, and it actually slowed them down. And even if users do their own testing, at some point in the future we'll implement the optimization for vectorized processing too, and it's likely that users that explicitly disabled vectorization will continue to have it disabled. I'd like to avoid this outcome by encouraging all users to enable vectorization at all times. Really advanced users would be following development activity anyway, and can read this issue
Currently all Druid processes share the same log4j2 configuration file located in _common directory. Since peon processes are spawned by middle manager process, they derivate the environment variables from the middle manager. These variables include those in the log4j2.xml controlling to which file the logger writes the log.
But current task logging mechanism requires the peon processes to output the log to console so that the middle manager can redirect the console output to a file and upload this file to task log storage.
So, this PR imposes this requirement to peon processes, whatever the configuration is in the shared log4j2.xml, peon processes always write the log to console.
setting thread names takes a measurable amount of time in the case where segment scans are very quick. In high-QPS testing we found a slight performance boost from turning off processing thread renaming. This option makes that possible.
Allow a Druid cluster to kill segments whose interval_end is a date in the future. This can be done by setting druid.coordinator.kill.durationToRetain to a negative period. For example PT-24H would allow segments to be killed if their interval_end date was 24 hours or less into the future at the time that the kill task is generated by the system.
A cluster operator can also disregard the druid.coordinator.kill.durationToRetain entirely by setting a new configuration, druid.coordinator.kill.ignoreDurationToRetain=true. This ignores interval_end date when looking for segments to kill, and instead is capable of killing any segment marked unused. This new configuration is off by default, and a cluster operator should fully understand and accept the risks if they enable it.
* Add feature flag for sql planning of TimeBoundary queries
* fixup! Add feature flag for sql planning of TimeBoundary queries
* Add documentation for enableTimeBoundaryPlanning
* fixup! Add documentation for enableTimeBoundaryPlanning
This PR is to measure how long a task stays in the pending queue and emits the value with the metric task/pending/time. The metric is measured in RemoteTaskRunner and HttpRemoteTaskRunner.
An example of the metric:
```
2022-04-26T21:59:09,488 INFO [rtr-pending-tasks-runner-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {"feed":"metrics","timestamp":"2022-04-26T21:59:09.487Z","service":"druid/coordinator","host":"localhost:8081","version":"2022.02.0-iap-SNAPSHOT","metric":"task/pending/time","value":8,"dataSource":"wikipedia","taskId":"index_parallel_wikipedia_gecpcglg_2022-04-26T21:59:09.432Z","taskType":"index_parallel"}
```
------------------------------------------
Key changed/added classes in this PR
Emit metric task/pending/time in classes RemoteTaskRunner and HttpRemoteTaskRunner.
Update related factory classes and tests.
* Reduce allocations due to Jackson serialization.
This patch attacks two sources of allocations during Jackson
serialization:
1) ObjectMapper.writeValue and JsonGenerator.writeObject create a new
DefaultSerializerProvider instance for each call. It has lots of
fields and creates pressure on the garbage collector. So, this patch
adds helper functions in JacksonUtils that enable reuse of
SerializerProvider objects and updates various call sites to make
use of this.
2) GroupByQueryToolChest copies the ObjectMapper for every query to
install a special module that supports backwards compatibility with
map-based rows. This isn't needed if resultAsArray is set and
all servers are running Druid 0.16.0 or later. This release was a
while ago. So, this patch disables backwards compatibility by default,
which eliminates the need to copy the heavyweight ObjectMapper. The
patch also introduces a configuration option that allows admins to
explicitly enable backwards compatibility.
* Add test.
* Update additional call sites and add to forbidden APIs.
* Update caching.md
Knowledge from https://the-asf.slack.com/archives/CJ8D1JTB8/p1597781107153900
Update caching.md
A few additional updates OTBO https://the-asf.slack.com/archives/CJ8D1JTB8/p1608669046041300
* Update caching.md
Typos
* Amendments on the segment cache
Significant updates on content around the segment cache, pull process, and in-memory cache
* Update docs/design/historical.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/operations/basic-cluster-tuning.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/operations/basic-cluster-tuning.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update basic-cluster-tuning.md
typo
* Update docs/querying/caching.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Whole-query caching update
Made more succinct and removed specific config to change.
* Update docs/design/historical.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update ingestion-spec.md
Added indexSpecForIntermediatePersists as a common configuration property.
* Update ingestion-spec.md
Amended to remove "below" and add link to the table.
* Update ingestion-spec.md
Removed passive.
* Optionally load segment index files into page cache on bootstrap and new segment download
* Fix unit test failure
* Fix test case
* fix spelling
* fix spelling
* fix test and test coverage issues
Co-authored-by: Jian Wang <wjhypo@gmail.com>
* fix(docs): clarify what s3 permissions are needed based on the permissions model
* fix typo
* Update docs/development/extensions-core/s3.md
Co-authored-by: Jihoon Son <jihoonson@apache.org>
Co-authored-by: Jihoon Son <jihoonson@apache.org>
* add data format and example for featureSpec
* add second feature in example
* Apply suggestions from code review
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
amazon-kinesis-client was not covered undered the apache license and required separate insertion in the kinesis extension.
This can now be avoided since it is covered, and including it within druid helps prevent incompatibilities.
Allows enabling of deaggregation out of the box by packaging amazon-kinesis-client (1.14.4) with druid for kinesis ingestion.
The current default value of inputSegmentSizeBytes is 400MB, which is pretty
low for most compaction use cases. Thus most users are forced to override the
default.
The default value is now increased to Long.MAX_VALUE.
listShards API was used to get all the shards for kinesis ingestion to improve its resiliency as part of #12161.
However, this may require additional permissions in the IAM policy where the stream is present. (Please refer to: https://docs.aws.amazon.com/kinesis/latest/APIReference/API_ListShards.html).
A dynamic configuration useListShards has been added to KinesisSupervisorTuningConfig to control the usage of this API and prevent issues upon upgrade. It can be safely turned on (and is recommended when using kinesis ingestion) by setting this configuration to true.
* Counting nulls in String cardinality with a config
* Adding tests for the new config
* Wrapping the vectorize part to allow backward compatibility
* Adding different tests, cleaning the code and putting the check at the proper position, handling hasRow() and hasValue() changes
* Updating testcase and code
* Adding null handling test to improve coverage
* Checkstyle fix
* Adding 1 more change in docs
* Making docs clearer
* Docs: Masking S3 creds and some rewording
Knowledge transfer from https://groups.google.com/g/druid-user/c/FydcpFrA688
* Removed bold in one of the quote sections
* Update s3.md
* Update s3.md
Quick grammar change
* Update docs/development/extensions-core/s3.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/development/extensions-core/s3.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/development/extensions-core/s3.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/development/extensions-core/s3.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/development/extensions-core/s3.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update s3.md
Typo
* Update docs/development/extensions-core/s3.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update s3.md
Active lang
* Update s3.md
LAng nit
* Update native-batch.md
LAng nit
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Grammar tidy-up and link fix
Corrected 2 x links to old page H2s, resolved the question around precedence, and some other grammatical changes.
* Update docs/development/extensions-core/s3.md
* Update s3.md
Removed an Erroneous E
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update math-expr.md
Link back to transformSpec
* Update ingestion-spec.md
Moved info about using the timestamp inside transforms into the actual timestamp section.
* Update ingestion-spec.md
Active language.
* Update ingestion-spec.md
Added best practice point to dimensions description.
* Update docs/ingestion/ingestion-spec.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* add docs for request logging
* remove stray character
* Update docs/operations/request-logging.md
Co-authored-by: TSFenwick <tsfenwick@gmail.com>
* Apply suggestions from code review
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: TSFenwick <tsfenwick@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Added Calcites InQueryThreshold as a query context parameter. Setting this parameter appropriately reduces the time taken for queries with large number of values in their IN conditions.
Add config for eager / lazy connection initialization in ResourcePool
Description
Currently, when multiple tasks are launched, each of them eagerly initializes a full pool's worth of connections to the coordinator.
While this is acceptable when the parameter for number of eagerConnections (== maxSize) is small, this can be problematic in environments where it's a large value (say 1000) and multiple tasks are launched simultaneously, which can cause a large number of connections to be created to the coordinator, thereby overwhelming it.
Patch
Nodes like the broker may require eager initialization of resources and do not create connections with the Coordinator.
It is unnecessary to do this with other types of nodes.
A config parameter eagerInitialization is added, which when set to true, initializes the max permissible connections when ResourcePool is initialized.
If set to false, lazy initialization of connection resources takes place.
NOTE: All nodes except the broker have this new parameter set to false in the quickstart as part of this PR
Algorithm
The current implementation relies on the creation of maxSize resources eagerly.
The new implementation's behaviour is as follows:
If a resource has been previously created and is available, lend it.
Else if the number of created resources is less than the allowed parameter, create and lend it.
Else, wait for one of the lent resources to be returned.
* Tombstone support for replace functionality
* A used segment interval is the interval of a current used segment that overlaps any of the input intervals for the spec
* Update compaction test to match replace behavior
* Adapt ITAutoCompactionTest to work with tombstones rather than dropping segments. Add support for tombstones in the broker.
* Style plus simple queriableindex test
* Add segment cache loader tombstone test
* Add more tests
* Add a method to the LogicalSegment to test whether it has any data
* Test filter with some empty logical segments
* Refactor more compaction/dropexisting tests
* Code coverage
* Support for all empty segments
* Skip tombstones when looking-up broker's timeline. Discard changes made to tool chest to avoid empty segments since they will no longer have empty segments after lookup because we are skipping over them.
* Fix null ptr when segment does not have a queriable index
* Add support for empty replace interval (all input data has been filtered out)
* Fixed coverage & style
* Find tombstone versions from lock versions
* Test failures & style
* Interner was making this fail since the two segments were consider equal due to their id's being equal
* Cleanup tombstone version code
* Force timeChunkLock whenever replace (i.e. dropExisting=true) is being used
* Reject replace spec when input intervals are empty
* Documentation
* Style and unit test
* Restore test code deleted by mistake
* Allocate forces TIME_CHUNK locking and uses lock versions. TombstoneShardSpec added.
* Unused imports. Dead code. Test coverage.
* Coverage.
* Prevent killer from throwing an exception for tombstones. This is the killer used in the peon for killing segments.
* Fix OmniKiller + more test coverage.
* Tombstones are now marked using a shard spec
* Drop a segment factory.json in the segment cache for tombstones
* Style
* Style + coverage
* style
* Add TombstoneLoadSpec.class to mapper in test
* Update core/src/main/java/org/apache/druid/segment/loading/TombstoneLoadSpec.java
Typo
Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>
* Update docs/configuration/index.md
Missing
Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>
* Typo
* Integrated replace with an existing test since the replace part was redundant and more importantly, the test file was very close or exceeding the 10 min default "no output" CI Travis threshold.
* Range does not work with multi-dim
Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>
* GroupBy: Cap dictionary-building selector memory usage.
New context parameter "maxSelectorDictionarySize" controls when the
per-segment processing code should return early and trigger a trip
to the merge buffer.
Includes:
- Vectorized and nonvectorized implementations.
- Adjustments to GroupByQueryRunnerTest to exercise this code in
the v2SmallDictionary suite. (Both the selector dictionary and
the merging dictionary will be small in that suite.)
- Tests for the new config parameter.
* Fix issues from tests.
* Add "pre-existing" to dictionary.
* Simplify GroupByColumnSelectorStrategy interface by removing one of the writeToKeyBuffer methods.
* Adjustments from review comments.
There aren't any changes in this patch that improve Java 11
compatibility; these changes have already been done separately. This
patch merely updates documentation and explicit Java version checks.
The log message adjustments in DruidProcessingConfig are there to make
things a little nicer when running in Java 11, where we can't measure
direct memory _directly_, and so we may auto-size processing buffers
incorrectly.
* add a new query laning metrics to visualize lane assignment
* fixes :spotbugs check
* Update docs/operations/metrics.md
Co-authored-by: Benedict Jin <asdf2014@apache.org>
* Update server/src/main/java/org/apache/druid/server/QueryScheduler.java
Co-authored-by: Benedict Jin <asdf2014@apache.org>
* Update server/src/main/java/org/apache/druid/server/QueryScheduler.java
Co-authored-by: Benedict Jin <asdf2014@apache.org>
Co-authored-by: Benedict Jin <asdf2014@apache.org>
Azure Blob storage has multiple modes of authentication. One of them is Shared access resource
. This is very useful in cases when we do not want to add the account key in the druid properties .
As part of #12078 one of the followup's was to have a specific config which does not allow accidental unnesting of multi value columns if such columns become part of the grouping key.
Added a config groupByEnableMultiValueUnnesting which can be set in the query context.
The default value of groupByEnableMultiValueUnnesting is true, therefore it does not change the current engine behavior.
If groupByEnableMultiValueUnnesting is set to false, the query will fail if it encounters a multi-value column in the grouping key.
* Moving in filter check to broker
* Adding more unit tests, making error message meaningful
* Spelling and doc changes
* Updating default to -1 and making this feature hide by default. The number of IN filters can grow upto a max limit of 100
* Removing upper limit of 100, updated docs
* Making documentation more meaningful
* Moving check outside to PlannerConfig, updating test cases and adding back max limit
* Updated with some additional code comments
* Missed removing one line during the checkin
* Addressing doc changes and one forbidden API correction
* Final doc change
* Adding a speling exception, correcting a testcase
* Reading entire filter tree to address combinations of ANDs and ORs
* Specifying in docs that, this case works only for ORs
* Revert "Reading entire filter tree to address combinations of ANDs and ORs"
This reverts commit 81ca8f8496.
* Covering a class cast exception and updating docs
* Counting changed
Co-authored-by: Jihoon Son <jihoonson@apache.org>
In extreme cases where many parallel indexing jobs are submitted together, it is possible
that the `ParallelIndexSupervisorTasks` take up all slots leaving no slot to schedule
their own sub-tasks thus stalling progress of all the indexing jobs.
Key changes:
- Add config `druid.indexer.runner.parallelIndexTaskSlotRatio` to limit the task slots
for `ParallelIndexSupervisorTasks` per worker
- `ratio = 1` implies supervisor tasks can use all slots on a worker if needed (default behavior)
- `ratio = 0` implies supervisor tasks can not use any slot on a worker
(actually, at least 1 slot is always available to ensure progress of parallel indexing jobs)
- `ImmutableWorkerInfo.canRunTask()`
- `WorkerHolder`, `ZkWorker`, `WorkerSelectUtils`
* refactor and link fixes
* add sql docs to left nav
* code format for needle
* updated web console script
* link fixes
* update earliest/latest functions
* edits for grammar and style
* more link fixes
* another link
* update with #12226
* update .spelling file
* array_concat_agg and array_agg support for array inputs
changes:
* added array_concat_agg to aggregate arrays into a single array
* added array_agg support for array inputs to make nested array
* added 'shouldAggregateNullInputs' and 'shouldCombineAggregateNullInputs' to fix a correctness issue with STRING_AGG and ARRAY_AGG when merging results, with dual purpose of being an optimization for aggregating
* fix test
* tie capabilities type to legacy mode flag about coercing arrays to strings
* oops
* better javadoc
* add new doc
* Apply suggestions from code review
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* reorder query laning properties
* rename doc
* new name in doc header
* organize material into "service tiering" section
* text edits and update sidebars.json
* update query laning
* how queries get assigned to lanes
* add more details to intro; use more consistent terminology
* more content
* Apply suggestions from code review
Co-authored-by: Jihoon Son <jihoonson@apache.org>
* Update docs/operations/mixed-workloads.md
* Apply suggestions from code review
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* typo
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Jihoon Son <jihoonson@apache.org>
* Use Druid's extension loading for integration test instead of maven
* fix maven command
* override config path
* load input format extensions and kafka by default; add prepopulated-data group
* all docker-composes are overridable
* fix s3 configs
* override config for all
* fix docker_compose_args
* fix security tests
* turn off debug logs for overlord api calls
* clean up stuff
* revert docker-compose.yml
* fix override config for query error test; fix circular dependency in docker compose
* add back some dependencies in docker compose
* new maven profile for integration test
* example file filter
* Thread pool for broker
* Updating two tests to improve coverage for new method added
* Updating druidProcessingConfigTest to cover coverage
* Adding missed spelling errors caused in doc
* Adding test to cover lines of new function added