* groupBy query: optional limit push down to segment scan
* make segment level limit push down configurable
* fix teamcity errors
* fix segment limit pushdown flag handling on query level config override
* use equals for comparator check
* fix sql and null handling
* fix unused imports
* handle null offset in NullableValueGroupByColumnSelectorStrategy for buffer comparator similar to RowBasedGrouperHelper.NullableRowBasedKeySerdeHelper
* fix segment underReplicated/unavailable counts to be gauges instead of counters
* fix jvm/gc/cpu to be a counter instead of timre
jvm/gc/cpu represents the total cpu time spent for multiple gc
invocations, not the time spent in each gc cycle.
the number needs to be divided by jvm/gc/count to get the average gc
time per cycle
* update docs
* fix spellcheck
* Implementing dropwizard emitter for druid
making metric manager and alert emitters as optional
* Refactor and make things work
more improvements
improve docs
refactrings
* Fix teamcity inspections
* review comments
* more review comments
* add limit to max number of gauges
* update pom version
* fix pom
* review comments
* review comment
* review comments
* fix broken doc link
review comments
review comments
* review comments
* fix checkstyle
* more spell check fixes
* fix travis failures
* #7641 - Changing segment distribution algorithm to distribute segments to multiple segment cache locations
* Fixing indentation
* WIP
* Adding interface for location strategy selection, least bytes used strategy impl, round-robin strategy impl, locationSelectorStrategy config with least bytes used strategy as the default strategy
* fixing code style
* Fixing test
* Adding a method visible only for testing, fixing tests
* 1. Changing the method contract to return an iterator of locations instead of a single best location. 2. Check style fixes
* fixing the conditional statement
* Added testSegmentDistributionUsingLeastBytesUsedStrategy, fixed testSegmentDistributionUsingRoundRobinStrategy
* to trigger CI build
* Add documentation for the selection strategy configuration
* to re trigger CI build
* updated docs as per review comments, made LeastBytesUsedStorageLocationSelectorStrategy.getLocations a synchronzied method, other minor fixes
* In checkLocationConfigForNull method, using getLocations() to check for null instead of directly referring to the locations variable so that tests overriding getLocations() method do not fail
* Implementing review comments. Added tests for StorageLocationSelectorStrategy
* Checkstyle fixes
* Adding java doc comments for StorageLocationSelectorStrategy interface
* checkstyle
* empty commit to retrigger build
* Empty commit
* Adding suppressions for words leastBytesUsed and roundRobin of ../docs/configuration/index.md file
* Impl review comments including updating docs as suggested
* Removing checkLocationConfigForNull(), @NotEmpty annotation serves the purpose
* Round robin iterator to keep track of the no. of iterations, impl review comments, added tests for round robin strategy
* Fixing the round robin iterator
* Removed numLocationsToTry, updated java docs
* changing property attribute value from tier to type
* Fixing assert messages
* Added live reports for Kafka and Native batch task
* Removed unused local variables
* Added the missing unit test
* Refine unit test logic, add implementation for HttpRemoteTaskRunner
* checksytle fixes
* Update doc descriptions for updated API
* remove unnecessary files
* Fix spellcheck complaints
* More details for api descriptions
* Adjust defaults for hashed partitioning
If neither the partition size nor the number of shards are specified,
default to partitions of 5,000,000 rows (similar to the behavior of
dynamic partitions). Previously, both could be null and cause incorrect
behavior.
Specifying both a partition size and a number of shards now results in
an error instead of ignoring the partition size in favor of using the
number of shards. This is a behavior change that makes it more apparent
to the user that only one of the two properties will be honored
(previously, a message was just logged when the specified partition size
was ignored).
* Fix test
* Handle -1 as null
* Add -1 as null tests for single dim partitioning
* Simplify logic to handle -1 as null
* Address review comments
* Rename partition spec fields
Rename partition spec fields to be consistent across the various types
(hashed, single_dim, dynamic). Specifically, use targetNumRowsPerSegment
and maxRowsPerSegment in favor of targetPartitionSize and
maxSegmentSize. Consistent and clearer names are easier for users to
understand and use.
Also fix various IntelliJ inspection warnings and doc spelling mistakes.
* Fix test
* Improve docs
* Add targetRowsPerSegment to HashedPartitionsSpec
* move google ext docs from contrib to core
* fix links
* revert unintended change
* more links, add note to example ext doc that it was removed, unlink from sidebar
* Exit JVM on curator unhandled errors
If an unhandled error occurs when curator is talking to ZooKeeper, exit
the JVM in addition to stopping the lifecycle to prevent the process
from being left in a zombie state. With this change,
BoundedExponentialBackoffRetryWithQuit is no longer needed as when
curator exceeds the configured retries, it triggers its unhandled error
listeners. A new "connectionTimeoutMs" CuratorConfig setting is added
mostly to facilitate testing curator unhandled errors, but it may be
useful for users as well.
* Address review comments
* Add realization for updating version of derived segments in MaterializedView
* add unit test, and change code style for the sake of ease of understanding
* fix document's mistake of expression
* Zookeeper version is updated.
* Zookeeper version is updated at licenses.yaml
* licenses.yaml is updated and dependencies are fixed to make the project successfully build.
* Zookeeper versions are fixed at licenses.yaml
* Add group_id to overlord tasks API and sys.tasks table
* adjust test
* modify docs
* Make groupId nullable
* fix integration test
* fix toString
* Remove groupId from TaskInfo
* Modify docs and tests
* modify TaskMonitorTest
* migrate binary notice entries to live in licenses.yaml, use licenses.yaml and NOTICE to generate NOTICE.BINARY at distribution time
* +x
* move release scripts to distribution/bin, fixup notice script, trim dependencies for avro and kerberos in licenses.yaml
* add missing hdfs-storage dependencies
* revert to old syntax, fixes
* formatting
* update notices for recently updated dependencies
* Add TaskResourceCleaner; fix a couple of concurrency bugs in batch tasks
* kill runner when it's ready
* add comment
* kill run thread
* fix test
* Take closeable out of Appenderator
* add javadoc
* fix test
* fix test
* update javadoc
* add javadoc about killed task
* address comment
* Add support for parallel native indexing with shuffle for perfect rollup.
* Add comment about volatiles
* fix test
* fix test
* handling missing exceptions
* more clear javadoc for stopGracefully
* unused import
* update javadoc
* Add missing statement in javadoc
* address comments; fix doc
* add javadoc for isGuaranteedRollup
* Rename confusing variable name and fix typos
* fix typos; move fetch() to a better home; fix the expiration time
* add support https
* Enable ability to toggle SegmentMetadata request logging on/off
* Move SegmentMetadata query log filter to FilteredRequestLogger
* Update documentation to reflect the segment metadata flag moving to the filtered request logger
* Modify patch to allow blacklist of query types to not log to request logger
* Address styling and naming requests following latest code review
* Fix indentation on multiple locations per Druid style rules
* Add a cluster-wide configuration to force timeChunk lock and add a doc for segment locking
* add more test
* javadoc for missingIntervalsInOverwriteMode
* Fix test
* Address comments
* avoid spotbugs
* Add IPv4 SQL functions
New SQL functions for filtering IPv4 addresses:
- IPV4_MATCH: Check if IP address belongs to a subnet
- IPV4_PARSE: Convert string IP address to integer
- IPV4_STRINGIFY: Convert integer IP address to string
These are the SQL analogs of the druid expressions with the same name.
Filtering is more efficient when operating on IP addresses as integers
instead of strings.
* Refactor operator conversions into named constants
* Add IPv4 druid expressions
New druid expressions for filtering IPv4 addresses:
- ipv4address_match: Check if IP address belongs to a subnet
- ipv4address_parse: Convert string IP address to long
- ipv4address_stringify: Convert long IP address to string
These expressions operate on IP addresses represented as either strings
or longs, so that they can be applied to dimensions with mixed
representation of IP addresses. The filtering is more efficient when
operating on IP addresses as longs. In other words, the intended use
case is:
1) Use ipv4address_parse to convert to long at ingestion time
2) Use ipv4address_match to filter (on longs) at query time
3) Use ipv4adress_stringify to convert to (readable) string at query
time
* Fix licenses and null handling
* Simplify IPv4 expressions
* Fix tests
* Fix check for valid ipv4 address string