* Rename field, fix router documentation
* Add more lines to doc
* Apply doc suggestions from code review
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Add handoff wait time to ingestion stats report. Refactor some code for batch handoff
* fix checkstyle
* Add assertion to AbstractITBatchIndexTask to make sure report reflects wait for segments happened
* add docs to the task reports section of doc
* Update sql.md
Added two example queries and adjusted formatting of one that was already there
* Update docs/querying/sql.md
Co-authored-by: Frank Chen <frankchen@apache.org>
* Update docs/querying/sql.md
Co-authored-by: Frank Chen <frankchen@apache.org>
* Update docs/querying/sql.md
Co-authored-by: Frank Chen <frankchen@apache.org>
* Update docs/querying/sql.md
Co-authored-by: Frank Chen <frankchen@apache.org>
* Update sql.md
Co-authored-by: Frank Chen <frankchen@apache.org>
* add MV_FILTER_ONLY SQL function, and list filter virtual column
* MV_FILTER_NONE and more tests
* formatting
* o yeah, forgot can do easy thing
* style
* hmm why was that there
* test filtering on virtual column
* style
* meh
* do it right
* good bot
* Update index.md
Moved H4s underneath the H3 for the task log location and added hyperlinks.
* Update tasks.md
Added process information around log file generation, and subsumed text from the configuration guide into this explanatory text instead.
* Update tasks.md
.html > .md
* Update docs/ingestion/tasks.md
Co-authored-by: Frank Chen <frankchen@apache.org>
Co-authored-by: Frank Chen <frankchen@apache.org>
* Update description of batchProcessingMode
Update the description to explicitly mention a released version of Druid that the original version was referencing
* Update docs/configuration/index.md
* Update docs/configuration/index.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update granularities.md
Link-back to the ingestion spec as well as Native queries plus examples.
* Update docs/querying/granularities.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/granularities.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Make persists concurrent with ingestion
* Remove semaphore but keep concurrent persists (with add) and add push in the backround as well
* Go back to documented default persists (zero)
* Move to debug
* Remove unnecessary Atomics
* Comments on synchronization (or not) for sinks & sinkMetadata
* Some cleanup for unit tests but they still need further work
* Shutdown & wait for persists and push on close
* Provide support for three existing batch appenderators using batchProcessingMode flag
* Fix reference to wrong appenderator
* Fix doc typos
* Add BatchAppenderators class test coverage
* Add log message to batchProcessingMode final value, fix typo in enum name
* Another typo and minor fix to log message
* LEGACY->OPEN_SEGMENTS, Edit docs
* Minor update legacy->open segments log message
* More code comments, mostly small adjustments to naming etc
* fix spelling
* Exclude BtachAppenderators from Jacoco since it is fully tested but Jacoco still refuses to ack coverage
* Coverage for Appenderators & BatchAppenderators, name change of a method that was still using "legacy" rather than "openSegments"
Co-authored-by: Clint Wylie <cjwylie@gmail.com>
* Configurable maxStreamLength for doubles sketches
* fix equals/hashcode and it test failure
* fix test
* fix it test
* benchmark
* doc
* grouping key
* fix comment
* dependency check
* Update docs/development/extensions-core/datasketches-quantiles.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/sql.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/sql.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/sql.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/sql.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/sql.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/sql.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/querying/sql.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
LGTM
* Update native-batch.md
Knowledge from https://the-asf.slack.com/archives/CJ8D1JTB8/p1595434977062400
* Update native-batch.md
* Fixed broken link + some grammar
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
* Update native-batch.md
Some grammatical wizardry.
* Update native-batch.md
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/ingestion/native-batch.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Apply suggestions from code review
remove orphaned links
Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Add details to the Docker tutorial
Added links, explanations and other details to the Docker
tutorial to make it easier for first-time users.
* Fix spelling error
And add "Jupyter" to the spelling dictionary.
* Update docs/tutorials/docker.md
* Update docs/tutorials/docker.md
Co-authored-by: sthetland <steve.hetland@imply.io>
* Update docs/tutorials/docker.md
Co-authored-by: sthetland <steve.hetland@imply.io>
* Update docs/tutorials/docker.md
* Update docs/tutorials/docker.md
Co-authored-by: sthetland <steve.hetland@imply.io>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: sthetland <steve.hetland@imply.io>
* Add prometheus-emitter docs
* Update docs/development/extensions.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Affinity "strong" definition
Reworded "strong" to emphasise meaning and consequences - OTBO https://the-asf.slack.com/archives/CJ8D1JTB8/p1609558156092800
* Spelling corrections
* Update docs/configuration/index.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/configuration/index.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
This change updates doc to clarify when and how a change to druid.auth.authenticator.basic.credentialIterations takes effect: changes apply only to new users or existing users upon changing their password via the credentials API, which may not be the expectation.
Fixes#11297.
Description
Description and design in the proposal #11297
Key changed/added classes in this PR
*DataSegmentPusher
*ShuffleClient
*PartitionStat
*PartitionLocation
*IntermediaryDataManager
* add binary_byte_format/decimal_byte_format/decimal_format
* clean code
* fix doc
* fix review comments
* add spelling check rules
* remove extra param
* improve type handling and null handling
* remove extra zeros
* fix tests and add space between unit suffix and number as most size-format functions do
* fix tests
* add examples
* change function names according to review comments
* fix merge
Signed-off-by: frank chen <frank.chen021@outlook.com>
* no need to configure NullHandling explicitly for tests
Signed-off-by: frank chen <frank.chen021@outlook.com>
* fix tests in SQL-Compatible mode
Signed-off-by: frank chen <frank.chen021@outlook.com>
* Resolve review comments
* Update SQL test case to check null handling
* Fix intellij inspections
* Add more examples
* Fix example
This PR adds a new property druid.router.sql.enable which allows the
Router to handle SQL queries when set to true.
This change does not affect Avatica JDBC requests and they are still routed
by hashing the Connection ID.
To allow parsing of the request object as a SqlQuery (contained in module druid-sql),
some classes have been moved from druid-server to druid-services with
the same package name.
* fix little typo
* Update docs/misc/math-expr.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update default maxSegmentsInNodeLoadingQueue
Update the default maxSegmentsInNodeLoadingQueue from 0 (unbounded) to 100.
An unbounded maxSegmentsInNodeLoadingQueue can cause cluster instability.
Since this is the default druid operators need to run into this instability
and then look through the docs to see that the recommended value for a large
cluster is 1000. This change makes it so the default will prevent clusters
from falling over as they grow over time.
* update tests
* codestyle
* Add a note
* Update docs/configuration/index.md
Co-authored-by: sthetland <steve.hetland@imply.io>
* clarify that both of non-result level cache and result level cache are not supported
Co-authored-by: sthetland <steve.hetland@imply.io>
* Allow kill task to mark segments as unused
* Add IndexerSQLMetadataStorageCoordinator test
* Update docs/ingestion/data-management.md
Co-authored-by: Jihoon Son <jihoonson@apache.org>
* Add warning to kill task doc
Co-authored-by: Jihoon Son <jihoonson@apache.org>
This change allows the selection of a specific broker service (or broker tier) by the Router.
The newly added ManualTieredBrokerSelectorStrategy works as follows:
Check for the parameter brokerService in the query context. If this is a valid broker service, use it.
Check if the field defaultManualBrokerService has been set in the strategy. If this is a valid broker service, use it.
Move on to the next strategy
* HLL lgK and a tip
Knowledge transfer from https://the-asf.slack.com/archives/CJ8D1JTB8/p1600699967024200. Attempted to make a connection between the SQL HLL function and the HLL underneath without getting too complicated. Also added a note about using K over 16 being pretty much pointless.
* Corrected spelling
* Create datasketches-hll.md
Put roll-up back to rollup
* Update docs/development/extensions-core/datasketches-hll.md
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
* Add a new metric query/segments/count that is not emitted by default
* docs
* test the default implementation of the metric
* fix spelling error in docs
* document the fact that query retries will result in additional metric emissions
* update using recommended text from @jihoonson
* fix doc
* address comments
* Update docs/configuration/index.md
Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
This PR splits current SegmentLoader into SegmentLoader and SegmentCacheManager.
SegmentLoader - this class is responsible for building the segment object but does not expose any methods for downloading, cache space management, etc. Default implementation delegates the download operations to SegmentCacheManager and only contains the logic for building segments once downloaded. . This class will be used in SegmentManager to construct Segment objects.
SegmentCacheManager - this class manages the segment cache on the local disk. It fetches the segment files to the local disk, can clean up the cache, and in the future, support reserve and release on cache space. [See https://github.com/Make SegmentLoader extensible and customizable #11398]. This class will be used in ingestion tasks such as compaction, re-indexing where segment files need to be downloaded locally.
* compaction/status API retains status for datasources that no longer existed causing in-memory used to grow unbounded
* compaction/status API retains status for datasources that no longer existed causing in-memory used to grow unbounded
* compaction/status API retains status for datasources that no longer existed causing in-memory used to grow unbounded
* fix test
* fix test
* Bound memory in native batch ingest create segments
* Move BatchAppenderatorDriverTest to indexing service... note that we had to put the sink back in sinks in mergeandpush since the persistent data needs to be dropped and the sink is required for that
* Remove sinks from memory and clean up intermediate persists dirs manually after sink has been merged
* Changed name from RealtimeAppenderator to StreamAppenderator
* Style
* Incorporating tests from StreamAppenderatorTest
* Keep totalRows and cleanup code
* Added missing dep
* Fix unit test
* Checkstyle
* allowIncrementalPersists should always be true for batch
* Added sinks metadata
* clear sinks metadata when closing appenderator
* Style + minor edits to log msgs
* Update sinks metadata & totalRows when dropping a sink (segment)
* Remove max
* Intelli-j check
* Keep a count of hydrants persisted by sink for sanity check before merge
* Move out sanity
* Add previous hydrant count to sink metadata
* Remove redundant field from SinkMetadata
* Remove unneeded functions
* Cleanup unused code
* Removed unused code
* Remove unused field
* Exclude it from jacoco because it is very hard to get branch coverage
* Remove segment announcement and some other minor cleanup
* Add fallback flag
* Minor code cleanup
* Checkstyle
* Code review changes
* Update batchMemoryMappedIndex name
* Code review comments
* Exclude class from coverage, will include again when packaging gets fixed
* Moved test classes to server module
* More BatchAppenderator cleanup
* Fix bug in wrong counting of totalHydrants plus minor cleanup in add
* Removed left over comments
* Have BatchAppenderator follow the Appenderator contract for push & getSegments
* Fix LGTM violations
* Review comments
* Add stats after push is done
* Code review comments (cleanup, remove rest of synchronization constructs in batch appenderator, reneame feature flag,
remove real time flag stuff from stream appenderator, etc.)
* Update javadocs
* Add thread safety notice to BatchAppenderator
* Further cleanup config
* More config cleanup
* Avro union support
* Document new union support
* Add support for AvroStreamInputFormat and fix checkstyle
* Extend multi-member union test schema and format
* Some additional docs and add Enums to spelling
* Rename explodeUnions -> extractUnions
* explode -> extract
* ByType
* Correct spelling error
* add single input string expression dimension vector selector and better expression planning
* better
* fixes
* oops
* rework how vector processor factories choose string processors, fix to be less aggressive about vectorizing
* oops
* javadocs, renaming
* more javadocs
* benchmarks
* use string expression vector processor with vector size 1 instead of expr.eval
* better logging
* javadocs, surprising number of the the
* more
* simplify
* Create /opt/data to fix permission problem
* eliminate symlink to avoid compatibility problem on AWS Fargate
* Add a workaround section
* Update instruction for named volume
* Use named volume in docker-compose
* Revert some doc change
* Resolve review comments
With this change, Druid will only support ZooKeeper 3.5.x and later.
In order to support Java 15 we need to switch to ZK 3.5.x client libraries and drop support for ZK 3.4.x
(see #10780 for the detailed reasons)
* remove ZooKeeper 3.4.x compatibility
* exclude additional ZK 3.5.x netty dependencies to ensure we use our version
* keep ZooKeeper version used for integration tests in sync with client library version
* remove the need to specify ZK version at runtime for docker
* add support to run integration tests with JDK 15
* build and run unit tests with Java 15 in travis
* Avoid mapping hydrants in create segments phase for native ingestion
* Drop queriable indices after a given sink is fully merged
* Do not drop memory mappings for realtime ingestion
* Style fixes
* Renamed to match use case better
* Rollback memoization code and use the real time flag instead
* Null ptr fix in FireHydrant toString plus adjustments to memory pressure tracking calculations
* Style
* Log some count stats
* Make sure sinks size is obtained at the right time
* BatchAppenderator unit test
* Fix comment typos
* Renamed methods to make them more readable
* Move persisted metadata from FireHydrant class to AppenderatorImpl. Removed superfluous differences and fix comment typo. Removed custom comparator
* Missing dependency
* Make persisted hydrant metadata map concurrent and better reflect the fact that keys are Java references. Maintain persisted metadata when dropping/closing segments.
* Replaced concurrent variables with normal ones
* Added batchMemoryMappedIndex "fallback" flag with default "false". Set this to "true" make code fallback to previous code path.
* Style fix.
* Added note to new setting in doc, using Iterables.size (and removing a dependency), and fixing a typo in a comment.
* Forgot to commit this edited documentation message
* SQL timeseries no longer skip empty buckets with all granularity
* add comment, fix tests
* the ol switcheroo
* revert unintended change
* docs and more tests
* style
* make checkstyle happy
* docs fixes and more tests
* add docs, tests for array_agg
* fixes
* oops
* doc stuffs
* fix compile, match doc style
* allow user to set group.id for Kafka ingestion task
* fix test coverage by removing deprecated code and add doc
* fix typo
* Update docs/development/extensions-core/kafka-ingestion.md
Co-authored-by: frank chen <frankchen@apache.org>
Co-authored-by: frank chen <frankchen@apache.org>