* MSQ: Fix task lock checking during publish, fix lock priority.
Fixes two issues:
1) ControllerImpl did not properly check the return value of
SegmentTransactionalInsertAction when doing a REPLACE. This could cause
it to not realize that its locks were preempted.
2) Task lock priority was the default of 0. It should be the higher
batch default of 50. The low priority made it possible for MSQ tasks
to be preempted by compaction tasks, which is not desired.
* Restructuring, add docs.
* Add performSegmentPublish tests.
* Fix tests.
* Compaction: Fetch segments one at a time on main task; skip when possible.
Compact tasks include the ability to fetch existing segments and determine
reasonable defaults for granularitySpec, dimensionsSpec, and metricsSpec.
This is a useful feature that makes compact tasks work well even when the
user running the compaction does not have a clear idea of what they want
the compacted segments to be like.
However, this comes at a cost: it takes time, and disk space, to do all
of these fetches. This patch improves the situation in two ways:
1) When segments do need to be fetched, download them one at a time and
delete them when we're done. This still takes time, but minimizes the
required disk space.
2) Don't fetch segments on the main compact task when they aren't needed.
If the user provides a full granularitySpec, dimensionsSpec, and
metricsSpec, we can skip it.
* Adjustments.
* Changes from code review.
* Fix logic for determining rollup.
* MSQ: Consider PARTITION_STATS_MAX_BYTES in WorkerMemoryParameters.
This consideration is important, because otherwise we can run out of
memory due to large statistics-tracking objects.
* Improved calculations.
* Always return sketches from DS_HLL, DS_THETA, DS_QUANTILES_SKETCH.
These aggregation functions are documented as creating sketches. However,
they are planned into native aggregators that include finalization logic
to convert the sketch to a number of some sort. This creates an
inconsistency: the functions sometimes return sketches, and sometimes
return numbers, depending on where they lie in the native query plan.
This patch changes these SQL aggregators to _never_ finalize, by using
the "shouldFinalize" feature of the native aggregators. It already
existed for theta sketches. This patch adds the feature for hll and
quantiles sketches.
As to impact, Druid finalizes aggregators in two cases:
- When they appear in the outer level of a query (not a subquery).
- When they are used as input to an expression or finalizing-field-access
post-aggregator (not any other kind of post-aggregator).
With this patch, the functions will no longer be finalized in these cases.
The second item is not likely to matter much. The SQL functions all declare
return type OTHER, which would be usable as an input to any other function
that makes sense and that would be planned into an expression.
So, the main effect of this patch is the first item. To provide backwards
compatibility with anyone that was depending on the old behavior, the
patch adds a "sqlFinalizeOuterSketches" query context parameter that
restores the old behavior.
Other changes:
1) Move various argument-checking logic from runtime to planning time in
DoublesSketchListArgBaseOperatorConversion, by adding an OperandTypeChecker.
2) Add various JsonIgnores to the sketches to simplify their JSON representations.
3) Allow chaining of ExpressionPostAggregators and other PostAggregators
in the SQL layer.
4) Avoid unnecessary FieldAccessPostAggregator wrapping in the SQL layer,
now that expressions can operate on complex inputs.
5) Adjust return type to thetaSketch (instead of OTHER) in
ThetaSketchSetBaseOperatorConversion.
* Fix benchmark class.
* Fix compilation error.
* Fix ThetaSketchSqlAggregatorTest.
* Hopefully fix ITAutoCompactionTest.
* Adjustment to ITAutoCompactionTest.
* Use lookup memory footprint in MSQ memory computations.
Two main changes:
1) Add estimateHeapFootprint to LookupExtractor.
2) Use this in MSQ's IndexerWorkerContext when determining the total
amount of available memory. It's taken off the top.
This prevents MSQ tasks from running out of memory when there are lookups
defined in the cluster.
* Updates from code review.
* Support for middle manager less druid, tasks launch as k8s jobs
* Fixing forking task runner test
* Test cleanup, dependency cleanup, intellij inspections cleanup
* Changes per PR review
Add configuration option to disable http/https proxy for the k8s client
Update the docs to provide more detail about sidecar support
* Removing un-needed log lines
* Small changes per PR review
* Upon task completion we callback to the overlord to update the status / locaiton, for slower k8s clusters, this reduces locking time significantly
* Merge conflict fix
* Fixing tests and docs
* update tiny-cluster.yaml
changed `enableTaskLevelLogPush` to `encapsulatedTask`
* Apply suggestions from code review
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
* Minor changes per PR request
* Cleanup, adding test to AbstractTask
* Add comment in peon.sh
* Bumping code coverage
* More tests to make code coverage happy
* Doh a duplicate dependnecy
* Integration test setup is weird for k8s, will do this in a different PR
* Reverting back all integration test changes, will do in anotbher PR
* use StringUtils.base64 instead of Base64
* Jdk is nasty, if i compress in jdk 11 in jdk 17 the decompressed result is different
Co-authored-by: Rahul Gidwani <r_gidwani@apple.com>
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
* introduce a "tree" type to the flattenSpec
* feedback - rename exprs to nodes, use CollectionsUtils.isNullOrEmpty for guard
* feedback - expand docs to more clearly capture limitations of "tree" flattenSpec
* feedback - fix for typo on docs
* introduce a comment to explain defensive copy, tweak null handling
* fix: part of rebase
* mark ObjectFlatteners.FlattenerMaker as an ExtensionPoint and provide default for new tree type
* fix: objectflattener restore previous behavior to call getRootField for root type
* docs: ingestion/data-formats add note that ORC only supports path expressions
* chore: linter remove unused import
* fix: use correct newer form for empty DimensionsSpec in FlattenJSONBenchmark
* add FrontCodedIndexed for delta string encoding
* now for actual segments
* fix indexOf
* fixes and thread safety
* add bucket size 4, which seems generally better
* fixes
* fixes maybe
* update indexes to latest interfaces
* utf8 support
* adjust
* oops
* oops
* refactor, better, faster
* more test
* fixes
* revert
* adjustments
* fix prefixing
* more chill
* sql nested benchmark too
* refactor
* more comments and javadocs
* better get
* remove base class
* fix
* hot rod
* adjust comments
* faster still
* minor adjustments
* spatial index support
* spotbugs
* add isSorted to Indexed to strengthen indexOf contract if set, improve javadocs, add docs
* fix docs
* push into constructor
* use base buffer instead of copy
* oops
Tracking additional improvements requested by @paul-rogers: #13239
* api: refactor page so that indented bullet is child and unindented portion is parent
* get rid of post etc headings and combine them with the endpoint
* Update docs/operations/api-reference.md
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
* fix broken links
* fix typo
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
Async reads for JDBC:
Prevents JDBC timeouts on long queries by returning empty batches
when a batch fetch takes too long. Uses an async model to run the
result fetch concurrently with JDBC requests.
Fixed race condition in Druid's Avatica server-side handler
Fixed issue with no-user connections
Druid currently uses Zookeeper dependent options as the default.
This commit updates the following to use HTTP as the default instead.
- task runner. `druid.indexer.runner.type=remote -> httpRemote`
- load queue peon. `druid.coordinator.loadqueuepeon.type=curator -> http`
- server inventory view. `druid.serverview.type=curator -> http`
* add a note to the documentation about pre-built HLLSketches
Druid actually supports ingesting a pre-generated sketch column by using
the HLLSketchMerge aggregator. However, this functionality was
previously not made clear in the documentation.
* copyedit from the King's English to American English
* add suggested style changes
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* update log4j example
* fix some style issues
* Update docs/configuration/logging.md
Co-authored-by: Frank Chen <frankchen@apache.org>
Co-authored-by: Frank Chen <frankchen@apache.org>
* Clarified the behaviour of COUNT(DISTINCT column) on multi-value columns
* Update docs/querying/sql-aggregations.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Vadim Ogievetsky <vadimon@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Various documentation updates.
1) Split out "data management" from "ingestion". Break it into thematic pages.
2) Move "SQL-based ingestion" into the Ingestion category. Adjust content so
all conceptual content is in concepts.md and all syntax content is in reference.md.
Shorten the known issues page to the most interesting ones.
3) Add SQL-based ingestion to the ingestion method comparison page. Remove the
index task, since index_parallel is just as good when maxNumConcurrentSubTasks: 1.
4) Rename various mentions of "Druid console" to "web console".
5) Add additional information to ingestion/partitioning.md.
6) Remove a mention of Tranquility.
7) Remove a note about upgrading to Druid 0.10.1.
8) Remove no-longer-relevant task types from ingestion/tasks.md.
9) Move ingestion/native-batch-firehose.md to the hidden section. It was previously deprecated.
10) Move ingestion/native-batch-simple-task.md to the hidden section. It is still linked in some
places, but it isn't very useful compared to index_parallel, so it shouldn't take up space
in the sidebar.
11) Make all br tags self-closing.
12) Certain other cosmetic changes.
13) Update to node-sass 7.
* make travis use node12 for docs
Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com>
* remove things that do not apply
* fix more things
* pin node to a working version
* fix
* fixes
* known issues tidy up
* revert auto formatting changes
* remove management-uis page which is 100% lies
* don't mention the Coordinator console (that no longer exits)
* goodies
* fix typo
* prometheus-emitter supports sending metrics to pushgateway regularly and continuously
* spell check fix
* Optimization variable name and related documents
* Update docs/development/extensions-contrib/prometheus.md
OK, it looks more conspicuous
Co-authored-by: Frank Chen <frankchen@apache.org>
* Update doc
* Update docs/development/extensions-contrib/prometheus.md
Co-authored-by: Frank Chen <frankchen@apache.org>
* When PrometheusEmitter is closed, close the scheduler
* Ensure that registeredMetrics is thread safe.
* Local variable name optimization
* Remove unnecessary white space characters
Co-authored-by: Frank Chen <frankchen@apache.org>
* Update tutorial-kafka.md
Added missing command to the doc for zookeeper before starting kafka
* Update docs/tutorials/tutorial-kafka.md
Co-authored-by: Frank Chen <frankchen@apache.org>
* Add interpolation to JsonConfigurator
* Fix checkstyle
* Fix tests by removing common-text override
* Add back commons-text without version
* Remove unused hadoopDir configs
* Move some stuff to hopefully pass coverage
* more consistent expression error messages
* review stuff
* add NamedFunction for Function, ApplyFunction, and ExprMacro to share common stuff
* fixes
* add expression transform name to transformer failure, better parse_json error messaging
Co-authored-by: Clint Wylie <cjwylie@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: brian.le <brian.le@imply.io>
Compressed Big Decimal is an extension which provides support for
Mutable big decimal value that can be used to accumulate values
without losing precision or reallocating memory. This type helps in
absolute precision arithmetic on large numbers in applications,
where greater level of accuracy is required, such as financial
applications, currency based transactions. This helps avoid rounding
issues where in potentially large amount of money can be lost.
Accumulation requires that the two numbers have the same scale,
but does not require that they are of the same size. If the value
being accumulated has a larger underlying array than this value
(the result), then the higher order bits are dropped, similar to what
happens when adding a long to an int and storing the result in an
int. A compressed big decimal that holds its data with an embedded
array.
Compressed big decimal is an absolute number based complex type
based on big decimal in Java. This supports all the functionalities
supported by Java Big Decimal. Java Big Decimal is not mutable in
order to avoid big garbage collection issues. Compressed big decimal
is needed to mutate the value in the accumulator.
* apache#12063 Ease of hidding sensitive properties from /status/properties endpoint
* apache#12063 Ease of hidding sensitive properties from /status/properties endpoint
* apache#12063 Ease of hidding sensitive properties from /status/properties endpoint
using one property for hiding properties, updated the index.md to document hiddenProperties
* apache#12063 Ease of hidding sensitive properties from /status/properties endpoint
Added java docs
* apache#12063 Ease of hidding sensitive properties from /status/properties endpoint
Add "password", "key", "token", "pwd" as default druid.server.hiddenProperties
fixed typo and removed redundant space
Co-authored-by: zemin <zemin.piao@adyen.com>
Two improvements:
- Use a realistic targetRowsPerSegment, so if people copy and paste
the example from the docs, it will generate reasonable segments.
- Spell "countryName" correctly.
The docs say Java 17 support is experimental, and give tips on running
successfully with Java 17.
This patch also removes java.base/jdk.internal.perf and
jdk.management/com.sun.management.internal from the list of required
exports and opens, because they were formerly needed for JvmMonitor,
which was rewritten in #12481 to use MXBeans instead.
* json_value adjustments
changes:
* native json_value expression now has optional 3rd argument to specify type, which will cast all values to the specified type
* rework how JSON_VALUE is wired up in SQL. Now we are using a custom convertlet to translate JSON_VALUE(... RETURNING type) into dedicated JSON_VALUE_BIGINT, JSON_VALUE_DOUBLE, JSON_VALUE_VARCHAR, JSON_VALUE_ANY instead of using the calcite StandardConvertletTable that wraps JSON_VALUE_ANY in a CAST, so that we preserve the typing of JSON_VALUE to pass down to the native expression as the 3rd argument
* fix json_value_any to be usable by humans too, coverage
* fix bug
* checkstyle
* checkstyle
* review stuff
* validate that options to json_value are the supported options rather than ignore them
* remove more legacy undocumented functions
* KLL sketch
* added documentation
* direct static refs
* direct static refs
* fixed test
* addressed review points
* added KLL sketch related terms
* return a copy from get
* Copy unions when returning them from "get".
* Remove redundant "final".
Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>
Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
Added link to the relevant section of the Basic Cluster Tuning page on each process page.
This is in order to improve access to this information, which is not easy to find through search or nav.
* Introduce defaultOnDiskStorage config for groupBy
* add debug log to groupby query config
* Apply config change suggestion from review
* Remove accidental new lines
* update default value of new default disk storage config
* update debug log to have more descriptive text
* Make maxOnDiskStorage and defaultOnDiskStorage HumanRedadableBytes
* improve test coverage
* Provide default implementation to new default method on advice of reviewer
* change kafka lookups module to not commit offsets
The current behaviour of the Kafka lookup extractor is to not commit
offsets by assigning a unique ID to the consumer group and setting
auto.offset.reset to earliest. This does the job but also pollutes the
Kafka broker with a bunch of "ghost" consumer groups that will never again be
used.
To fix this, we now set enable.auto.commit to false, which prevents the
ghost consumer groups being created in the first place.
* update docs to include new enable.auto.commit setting behaviour
* update kafka-lookup-extractor documentation
Provide some additional detail on functionality and configuration.
Hopefully this will make it clearer how the extractor works for
developers who aren't so familiar with Kafka.
* add comments better explaining the logic of the code
* add spelling exceptions for kafka lookup docs
* remove kafka lookup records from factory when record tombstoned
* update kafka lookup docs to include tombstone behaviour
* change test wait time down to 10ms
Co-authored-by: David Palmer <david.palmer@adscale.co.nz>
* Adjust "in" filter null behavior to match "selector".
Now, both of them match numeric nulls if constructed with a "null" value.
This is consistent as far as native execution goes, but doesn't match
the behavior of SQL = and IN. So, to address that, this patch also
updates the docs to clarify that the native filters do match nulls.
This patch also updates the SQL docs to describe how Boolean logic is
handled in addition to how NULL values are handled.
Fixes#12856.
* Fix test.
Kinesis ingestion requires all shards to have at least 1 record at the required position in druid.
Even if this is satisified initially, resharding the stream can lead to empty intermediate shards. A significant delay in writing to newly created shards was also problematic.
Kinesis shard sequence numbers are big integers. Introduce two more custom sequence tokens UNREAD_TRIM_HORIZON and UNREAD_LATEST to indicate that a shard has not been read from and that it needs to be read from the start or the end respectively.
These values can be used to avoid the need to read at least one record to obtain a sequence number for ingesting a newly discovered shard.
If a record cannot be obtained immediately, use a marker to obtain the relevant shardIterator and use this shardIterator to obtain a valid sequence number. As long as a valid sequence number is not obtained, continue storing the token as the offset.
These tokens (UNREAD_TRIM_HORIZON and UNREAD_LATEST) are logically ordered to be earlier than any valid sequence number.
However, the ordering requires a few subtle changes to the existing mechanism for record sequence validation:
The sequence availability check ensures that the current offset is before the earliest available sequence in the shard. However, current token being an UNREAD token indicates that any sequence number in the shard is valid (despite the ordering)
Kinesis sequence numbers are inclusive i.e if current sequence == end sequence, there are more records left to read.
However, the equality check is exclusive when dealing with UNREAD tokens.
* Improved Java 17 support and Java runtime docs.
1) Add a "Java runtime" doc page with information about supported
Java versions, garbage collection, and strong encapsulation..
2) Update asm and equalsverifier to versions that support Java 17.
3) Add additional "--add-opens" lines to surefire configuration, so
tests can pass successfully under Java 17.
4) Switch openjdk15 tests to openjdk17.
5) Update FrameFile to specifically mention Java runtime incompatibility
as the cause of not being able to use Memory.map.
6) Update SegmentLoadDropHandler to log an error for Errors too, not
just Exceptions. This is important because an IllegalAccessError is
encountered when the correct "--add-opens" line is not provided,
which would otherwise be silently ignored.
7) Update example configs to use druid.indexer.runner.javaOptsArray
instead of druid.indexer.runner.javaOpts. (The latter is deprecated.)
* Adjustments.
* Use run-java in more places.
* Add run-java.
* Update .gitignore.
* Exclude hadoop-client-api.
Brought in when building on Java 17.
* Swap one more usage of java.
* Fix the run-java script.
* Fix flag.
* Include link to Temurin.
* Spelling.
* Update examples/bin/run-java
Co-authored-by: Xavier Léauté <xl+github@xvrl.net>
Co-authored-by: Xavier Léauté <xl+github@xvrl.net>
* Use nonzero default value of maxQueuedBytes.
The purpose of this parameter is to prevent the Broker from running out
of memory. The prior default is unlimited; this patch changes it to a
relatively conservative 25MB.
This may be too low for larger clusters. The risk is that throughput
can decrease for queries with large resultsets or large amounts of intermediate
data. However, I think this is better than the risk of the prior default, which
is that these queries can cause the Broker to go OOM.
* Alter calculation.
* Add clarification for combining input source
* Update inputFormat note
* Update maxNumConcurrentSubTasks note
* Fix broken link
* Update docs/ingestion/native-batch-input-source.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>