* Emit state of replace and append for native batch tasks
* Emit count of one depending on batch ingestion mode (APPEND, OVERWRITE, REPLACE)
* Add metric to compaction job
* Avoid null ptr exc when null emitter
* Coverage
* Emit tombstone & segment counts
* Tasks need a type
* Spelling
* Integrate BatchIngestionMode in batch ingestion tasks functionality
* Typos
* Remove batch ingestion type from metric since it is already in a dimension. Move IngestionMode to AbstractTask to facilitate having mode as a dimension. Add metrics to streaming. Add missing coverage.
* Avoid inner class referenced by sub-class inspection. Refactor computation of IngestionMode to make it more robust to null IOConfig and fix test.
* Spelling
* Avoid polluting the Task interface
* Rename computeCompaction methods to avoid ambiguous java compiler error if they are passed null. Other minor cleanup.
* ConcurrentGrouper: Add option to always slice up merge buffers thread-locally.
Normally, the ConcurrentGrouper shares merge buffers across processing
threads until spilling starts, and then switches to a thread-local model.
This minimizes memory use and reduces likelihood of spilling, which is
good, but it creates thread contention. The new mergeThreadLocal option
causes a query to start in thread-local mode immediately, and allows us
to experiment with the relative performance of the two modes.
* Fix grammar in docs.
* Fix race in ConcurrentGrouper.
* Fix issue with timeouts.
* Remove unused import.
* Add "tradeoff" to dictionary.
* SQL: Add is_active to sys.segments, update examples and docs.
is_active is short for:
(is_published = 1 AND is_overshadowed = 0) OR is_realtime = 1
It's important because this represents "all the segments that should
be queryable, whether or not they actually are right now". Most of the
time, this is the set of segments that people will want to look at.
The web console already adds this filter to a lot of its queries,
proving its usefulness.
This patch also reworks the caveat at the bottom of the sys.segments
section, so its information is mixed into the description of each result
field. This should make it more likely for people to see the information.
* Wording updates.
* Adjustments for spellcheck.
* Adjust IT.
* Improved docs for range partitioning.
1) Clarify the benefits of range partitioning.
2) Clarify which filters support pruning.
3) Include the fact that multi-value dimensions cannot be used for partitioning.
* Additional clarification.
* Update other section.
* Another adjustment.
* Updates from review.
* docs(fix): clarify how worker.version and minWorkerVersion comparison works
* Revert "docs(fix): clarify how worker.version and minWorkerVersion comparison works"
This reverts commit cadd1fdc60.
* docs(fix): clarify how worker.version and minWorkerVersion comparison works
* Apply suggestions from code review
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/configuration/index.md
fix spelling
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
In the majority of cases, this improves performance.
There's only one case I'm aware of where this may be a net negative: for time_floor(__time, <period>) where there are many repeated __time values. In nonvectorized processing, SingleLongInputCachingExpressionColumnValueSelector implements an optimization to avoid computing the time_floor function on every row. There is no such optimization in vectorized processing.
IMO, we shouldn't mention this in the docs. Rationale: It's too fiddly of a thing: it's not guaranteed that nonvectorized processing will be faster due to the optimization, because it would have to overcome the inherent speed advantage of vectorization. So it'd always require testing to determine the best setting for a specific dataset. It would be bad if users disabled vectorization thinking it would speed up their queries, and it actually slowed them down. And even if users do their own testing, at some point in the future we'll implement the optimization for vectorized processing too, and it's likely that users that explicitly disabled vectorization will continue to have it disabled. I'd like to avoid this outcome by encouraging all users to enable vectorization at all times. Really advanced users would be following development activity anyway, and can read this issue
Currently all Druid processes share the same log4j2 configuration file located in _common directory. Since peon processes are spawned by middle manager process, they derivate the environment variables from the middle manager. These variables include those in the log4j2.xml controlling to which file the logger writes the log.
But current task logging mechanism requires the peon processes to output the log to console so that the middle manager can redirect the console output to a file and upload this file to task log storage.
So, this PR imposes this requirement to peon processes, whatever the configuration is in the shared log4j2.xml, peon processes always write the log to console.
setting thread names takes a measurable amount of time in the case where segment scans are very quick. In high-QPS testing we found a slight performance boost from turning off processing thread renaming. This option makes that possible.
Allow a Druid cluster to kill segments whose interval_end is a date in the future. This can be done by setting druid.coordinator.kill.durationToRetain to a negative period. For example PT-24H would allow segments to be killed if their interval_end date was 24 hours or less into the future at the time that the kill task is generated by the system.
A cluster operator can also disregard the druid.coordinator.kill.durationToRetain entirely by setting a new configuration, druid.coordinator.kill.ignoreDurationToRetain=true. This ignores interval_end date when looking for segments to kill, and instead is capable of killing any segment marked unused. This new configuration is off by default, and a cluster operator should fully understand and accept the risks if they enable it.
* Add feature flag for sql planning of TimeBoundary queries
* fixup! Add feature flag for sql planning of TimeBoundary queries
* Add documentation for enableTimeBoundaryPlanning
* fixup! Add documentation for enableTimeBoundaryPlanning
This PR is to measure how long a task stays in the pending queue and emits the value with the metric task/pending/time. The metric is measured in RemoteTaskRunner and HttpRemoteTaskRunner.
An example of the metric:
```
2022-04-26T21:59:09,488 INFO [rtr-pending-tasks-runner-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {"feed":"metrics","timestamp":"2022-04-26T21:59:09.487Z","service":"druid/coordinator","host":"localhost:8081","version":"2022.02.0-iap-SNAPSHOT","metric":"task/pending/time","value":8,"dataSource":"wikipedia","taskId":"index_parallel_wikipedia_gecpcglg_2022-04-26T21:59:09.432Z","taskType":"index_parallel"}
```
------------------------------------------
Key changed/added classes in this PR
Emit metric task/pending/time in classes RemoteTaskRunner and HttpRemoteTaskRunner.
Update related factory classes and tests.
* Reduce allocations due to Jackson serialization.
This patch attacks two sources of allocations during Jackson
serialization:
1) ObjectMapper.writeValue and JsonGenerator.writeObject create a new
DefaultSerializerProvider instance for each call. It has lots of
fields and creates pressure on the garbage collector. So, this patch
adds helper functions in JacksonUtils that enable reuse of
SerializerProvider objects and updates various call sites to make
use of this.
2) GroupByQueryToolChest copies the ObjectMapper for every query to
install a special module that supports backwards compatibility with
map-based rows. This isn't needed if resultAsArray is set and
all servers are running Druid 0.16.0 or later. This release was a
while ago. So, this patch disables backwards compatibility by default,
which eliminates the need to copy the heavyweight ObjectMapper. The
patch also introduces a configuration option that allows admins to
explicitly enable backwards compatibility.
* Add test.
* Update additional call sites and add to forbidden APIs.
* Update caching.md
Knowledge from https://the-asf.slack.com/archives/CJ8D1JTB8/p1597781107153900
Update caching.md
A few additional updates OTBO https://the-asf.slack.com/archives/CJ8D1JTB8/p1608669046041300
* Update caching.md
Typos
* Amendments on the segment cache
Significant updates on content around the segment cache, pull process, and in-memory cache
* Update docs/design/historical.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/design/historical.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/operations/basic-cluster-tuning.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/caching.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/operations/basic-cluster-tuning.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update basic-cluster-tuning.md
typo
* Update docs/querying/caching.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Whole-query caching update
Made more succinct and removed specific config to change.
* Update docs/design/historical.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update ingestion-spec.md
Added indexSpecForIntermediatePersists as a common configuration property.
* Update ingestion-spec.md
Amended to remove "below" and add link to the table.
* Update ingestion-spec.md
Removed passive.
* Optionally load segment index files into page cache on bootstrap and new segment download
* Fix unit test failure
* Fix test case
* fix spelling
* fix spelling
* fix test and test coverage issues
Co-authored-by: Jian Wang <wjhypo@gmail.com>
* fix(docs): clarify what s3 permissions are needed based on the permissions model
* fix typo
* Update docs/development/extensions-core/s3.md
Co-authored-by: Jihoon Son <jihoonson@apache.org>
Co-authored-by: Jihoon Son <jihoonson@apache.org>
* add data format and example for featureSpec
* add second feature in example
* Apply suggestions from code review
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
amazon-kinesis-client was not covered undered the apache license and required separate insertion in the kinesis extension.
This can now be avoided since it is covered, and including it within druid helps prevent incompatibilities.
Allows enabling of deaggregation out of the box by packaging amazon-kinesis-client (1.14.4) with druid for kinesis ingestion.
The current default value of inputSegmentSizeBytes is 400MB, which is pretty
low for most compaction use cases. Thus most users are forced to override the
default.
The default value is now increased to Long.MAX_VALUE.
listShards API was used to get all the shards for kinesis ingestion to improve its resiliency as part of #12161.
However, this may require additional permissions in the IAM policy where the stream is present. (Please refer to: https://docs.aws.amazon.com/kinesis/latest/APIReference/API_ListShards.html).
A dynamic configuration useListShards has been added to KinesisSupervisorTuningConfig to control the usage of this API and prevent issues upon upgrade. It can be safely turned on (and is recommended when using kinesis ingestion) by setting this configuration to true.
* Counting nulls in String cardinality with a config
* Adding tests for the new config
* Wrapping the vectorize part to allow backward compatibility
* Adding different tests, cleaning the code and putting the check at the proper position, handling hasRow() and hasValue() changes
* Updating testcase and code
* Adding null handling test to improve coverage
* Checkstyle fix
* Adding 1 more change in docs
* Making docs clearer
* Docs: Masking S3 creds and some rewording
Knowledge transfer from https://groups.google.com/g/druid-user/c/FydcpFrA688
* Removed bold in one of the quote sections
* Update s3.md
* Update s3.md
Quick grammar change
* Update docs/development/extensions-core/s3.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/development/extensions-core/s3.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/development/extensions-core/s3.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/development/extensions-core/s3.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/development/extensions-core/s3.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update s3.md
Typo
* Update docs/development/extensions-core/s3.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update s3.md
Active lang
* Update s3.md
LAng nit
* Update native-batch.md
LAng nit
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Grammar tidy-up and link fix
Corrected 2 x links to old page H2s, resolved the question around precedence, and some other grammatical changes.
* Update docs/development/extensions-core/s3.md
* Update s3.md
Removed an Erroneous E
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update math-expr.md
Link back to transformSpec
* Update ingestion-spec.md
Moved info about using the timestamp inside transforms into the actual timestamp section.
* Update ingestion-spec.md
Active language.
* Update ingestion-spec.md
Added best practice point to dimensions description.
* Update docs/ingestion/ingestion-spec.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* add docs for request logging
* remove stray character
* Update docs/operations/request-logging.md
Co-authored-by: TSFenwick <tsfenwick@gmail.com>
* Apply suggestions from code review
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: TSFenwick <tsfenwick@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Added Calcites InQueryThreshold as a query context parameter. Setting this parameter appropriately reduces the time taken for queries with large number of values in their IN conditions.
Add config for eager / lazy connection initialization in ResourcePool
Description
Currently, when multiple tasks are launched, each of them eagerly initializes a full pool's worth of connections to the coordinator.
While this is acceptable when the parameter for number of eagerConnections (== maxSize) is small, this can be problematic in environments where it's a large value (say 1000) and multiple tasks are launched simultaneously, which can cause a large number of connections to be created to the coordinator, thereby overwhelming it.
Patch
Nodes like the broker may require eager initialization of resources and do not create connections with the Coordinator.
It is unnecessary to do this with other types of nodes.
A config parameter eagerInitialization is added, which when set to true, initializes the max permissible connections when ResourcePool is initialized.
If set to false, lazy initialization of connection resources takes place.
NOTE: All nodes except the broker have this new parameter set to false in the quickstart as part of this PR
Algorithm
The current implementation relies on the creation of maxSize resources eagerly.
The new implementation's behaviour is as follows:
If a resource has been previously created and is available, lend it.
Else if the number of created resources is less than the allowed parameter, create and lend it.
Else, wait for one of the lent resources to be returned.
* Tombstone support for replace functionality
* A used segment interval is the interval of a current used segment that overlaps any of the input intervals for the spec
* Update compaction test to match replace behavior
* Adapt ITAutoCompactionTest to work with tombstones rather than dropping segments. Add support for tombstones in the broker.
* Style plus simple queriableindex test
* Add segment cache loader tombstone test
* Add more tests
* Add a method to the LogicalSegment to test whether it has any data
* Test filter with some empty logical segments
* Refactor more compaction/dropexisting tests
* Code coverage
* Support for all empty segments
* Skip tombstones when looking-up broker's timeline. Discard changes made to tool chest to avoid empty segments since they will no longer have empty segments after lookup because we are skipping over them.
* Fix null ptr when segment does not have a queriable index
* Add support for empty replace interval (all input data has been filtered out)
* Fixed coverage & style
* Find tombstone versions from lock versions
* Test failures & style
* Interner was making this fail since the two segments were consider equal due to their id's being equal
* Cleanup tombstone version code
* Force timeChunkLock whenever replace (i.e. dropExisting=true) is being used
* Reject replace spec when input intervals are empty
* Documentation
* Style and unit test
* Restore test code deleted by mistake
* Allocate forces TIME_CHUNK locking and uses lock versions. TombstoneShardSpec added.
* Unused imports. Dead code. Test coverage.
* Coverage.
* Prevent killer from throwing an exception for tombstones. This is the killer used in the peon for killing segments.
* Fix OmniKiller + more test coverage.
* Tombstones are now marked using a shard spec
* Drop a segment factory.json in the segment cache for tombstones
* Style
* Style + coverage
* style
* Add TombstoneLoadSpec.class to mapper in test
* Update core/src/main/java/org/apache/druid/segment/loading/TombstoneLoadSpec.java
Typo
Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>
* Update docs/configuration/index.md
Missing
Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>
* Typo
* Integrated replace with an existing test since the replace part was redundant and more importantly, the test file was very close or exceeding the 10 min default "no output" CI Travis threshold.
* Range does not work with multi-dim
Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>
* GroupBy: Cap dictionary-building selector memory usage.
New context parameter "maxSelectorDictionarySize" controls when the
per-segment processing code should return early and trigger a trip
to the merge buffer.
Includes:
- Vectorized and nonvectorized implementations.
- Adjustments to GroupByQueryRunnerTest to exercise this code in
the v2SmallDictionary suite. (Both the selector dictionary and
the merging dictionary will be small in that suite.)
- Tests for the new config parameter.
* Fix issues from tests.
* Add "pre-existing" to dictionary.
* Simplify GroupByColumnSelectorStrategy interface by removing one of the writeToKeyBuffer methods.
* Adjustments from review comments.
There aren't any changes in this patch that improve Java 11
compatibility; these changes have already been done separately. This
patch merely updates documentation and explicit Java version checks.
The log message adjustments in DruidProcessingConfig are there to make
things a little nicer when running in Java 11, where we can't measure
direct memory _directly_, and so we may auto-size processing buffers
incorrectly.
* add a new query laning metrics to visualize lane assignment
* fixes :spotbugs check
* Update docs/operations/metrics.md
Co-authored-by: Benedict Jin <asdf2014@apache.org>
* Update server/src/main/java/org/apache/druid/server/QueryScheduler.java
Co-authored-by: Benedict Jin <asdf2014@apache.org>
* Update server/src/main/java/org/apache/druid/server/QueryScheduler.java
Co-authored-by: Benedict Jin <asdf2014@apache.org>
Co-authored-by: Benedict Jin <asdf2014@apache.org>
Azure Blob storage has multiple modes of authentication. One of them is Shared access resource
. This is very useful in cases when we do not want to add the account key in the druid properties .
As part of #12078 one of the followup's was to have a specific config which does not allow accidental unnesting of multi value columns if such columns become part of the grouping key.
Added a config groupByEnableMultiValueUnnesting which can be set in the query context.
The default value of groupByEnableMultiValueUnnesting is true, therefore it does not change the current engine behavior.
If groupByEnableMultiValueUnnesting is set to false, the query will fail if it encounters a multi-value column in the grouping key.
* Moving in filter check to broker
* Adding more unit tests, making error message meaningful
* Spelling and doc changes
* Updating default to -1 and making this feature hide by default. The number of IN filters can grow upto a max limit of 100
* Removing upper limit of 100, updated docs
* Making documentation more meaningful
* Moving check outside to PlannerConfig, updating test cases and adding back max limit
* Updated with some additional code comments
* Missed removing one line during the checkin
* Addressing doc changes and one forbidden API correction
* Final doc change
* Adding a speling exception, correcting a testcase
* Reading entire filter tree to address combinations of ANDs and ORs
* Specifying in docs that, this case works only for ORs
* Revert "Reading entire filter tree to address combinations of ANDs and ORs"
This reverts commit 81ca8f8496.
* Covering a class cast exception and updating docs
* Counting changed
Co-authored-by: Jihoon Son <jihoonson@apache.org>
In extreme cases where many parallel indexing jobs are submitted together, it is possible
that the `ParallelIndexSupervisorTasks` take up all slots leaving no slot to schedule
their own sub-tasks thus stalling progress of all the indexing jobs.
Key changes:
- Add config `druid.indexer.runner.parallelIndexTaskSlotRatio` to limit the task slots
for `ParallelIndexSupervisorTasks` per worker
- `ratio = 1` implies supervisor tasks can use all slots on a worker if needed (default behavior)
- `ratio = 0` implies supervisor tasks can not use any slot on a worker
(actually, at least 1 slot is always available to ensure progress of parallel indexing jobs)
- `ImmutableWorkerInfo.canRunTask()`
- `WorkerHolder`, `ZkWorker`, `WorkerSelectUtils`
* refactor and link fixes
* add sql docs to left nav
* code format for needle
* updated web console script
* link fixes
* update earliest/latest functions
* edits for grammar and style
* more link fixes
* another link
* update with #12226
* update .spelling file
* array_concat_agg and array_agg support for array inputs
changes:
* added array_concat_agg to aggregate arrays into a single array
* added array_agg support for array inputs to make nested array
* added 'shouldAggregateNullInputs' and 'shouldCombineAggregateNullInputs' to fix a correctness issue with STRING_AGG and ARRAY_AGG when merging results, with dual purpose of being an optimization for aggregating
* fix test
* tie capabilities type to legacy mode flag about coercing arrays to strings
* oops
* better javadoc
* add new doc
* Apply suggestions from code review
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* reorder query laning properties
* rename doc
* new name in doc header
* organize material into "service tiering" section
* text edits and update sidebars.json
* update query laning
* how queries get assigned to lanes
* add more details to intro; use more consistent terminology
* more content
* Apply suggestions from code review
Co-authored-by: Jihoon Son <jihoonson@apache.org>
* Update docs/operations/mixed-workloads.md
* Apply suggestions from code review
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* typo
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Jihoon Son <jihoonson@apache.org>
* Use Druid's extension loading for integration test instead of maven
* fix maven command
* override config path
* load input format extensions and kafka by default; add prepopulated-data group
* all docker-composes are overridable
* fix s3 configs
* override config for all
* fix docker_compose_args
* fix security tests
* turn off debug logs for overlord api calls
* clean up stuff
* revert docker-compose.yml
* fix override config for query error test; fix circular dependency in docker compose
* add back some dependencies in docker compose
* new maven profile for integration test
* example file filter
* Thread pool for broker
* Updating two tests to improve coverage for new method added
* Updating druidProcessingConfigTest to cover coverage
* Adding missed spelling errors caused in doc
* Adding test to cover lines of new function added
* Add jsonPath functions support
* Add jsonPath function test for Avro
* Add jsonPath function length() to Orc
* Add jsonPath function length() to Parquet
* Add more tests to ORC format
* update doc
* Fix exception during ingestion
* Add IT test case
* Revert "Fix exception during ingestion"
This reverts commit 5a5484b9ea.
* update IT test case
* Add 'keys()'
* Commit IT test case
* Fix UT
* clean up the balancing code around the batched vs deprecated way of sampling segments to balance
* fix docs, clarify comments, add deprecated annotations to legacy code
* remove unused variable
* update dynamic config dialog in console to state percentOfSegmentsToConsiderPerMove deprecated
* fix dynamic config text for percentOfSegmentsToConsiderPerMove
* run prettier to cleanup coordinator-dynamic-config.tsx changes
* update jest snapshot
* update documentation per review feedback
* fix bug where queries fail immediately when timeout is 0 instead of using default timeout
* fix to use serverside max
* more better
* less flaky test
* oops
* Metrics docs layout and info about query/bytes
Knowledge transfer from https://groups.google.com/g/druid-user/c/8fiflmSEoTQ - updated the layout of the Metrics part, adding links between docs pages.
Update index.md
Amended typo
* Update docs/configuration/index.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/configuration/index.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/operations/metrics.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/operations/metrics.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/operations/metrics.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Feedback applied
Http --> HTTP and moved content / removed >
* Update docs/configuration/index.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/configuration/index.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update rollup.md
Added SE tip around roll-up.
* Update docs/ingestion/rollup.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Make nodeRole available during binding; add support for dynamic registration of DruidService
* fix checkstyle and test
* fix customRole test
* address comments
* add more javadoc
* apply log file rolling strategy
* fix doc
Signed-off-by: frank chen <frank.chen021@outlook.com>
* Use absolute log path and allow spaces in log path
* Update log4j2 configuration
* apply FileAppender to ZooKeeper
* DO NOT redirect application's console log to file in supervisor
changes:
* adds new config, druid.expressions.useStrictBooleans which make longs the official boolean type of all expressions
* vectorize logical operators and boolean functions, some only if useStrictBooleans is true
under "Aggregators", about the lgK setting, it said "Must be a power of 2 from 4 to 21 inclusively." 21 is not a power of 2, nor is 12, the given default. I think there may have been confusion because lgK represents log2 of K. We could say "K must be a power of 2...", or just say lgK must be between 4 and 21.
Currently, when we try to do EXPLAIN PLAN FOR, it returns the structure of the SQL parsed (via Calcite's internal planner util), which is verbose (since it tries to explain about the nodes in the SQL, instead of the Druid Query), and not representative of the native Druid query which will get executed on the broker side.
This PR aims to change the format when user tries to EXPLAIN PLAN FOR for queries which are executed by converting them into Druid's native queries (i.e. not sys schemas).
Add the ability to pass time column in first/last aggregator (and latest/earliest SQL functions). It is to support cases where the time to query upon is stored as a part of a column different than __time. Also, some other logical time column can be specified.
* add impl
* fix checkstyle
* add test
* add test
* add unit tests
* fix unit tests
* fix unit tests
* fix unit tests
* add IT
* add IT
* add comments
* fix spelling