* Add lastString and firstString aggregators extension
* Remove duplicated class
* Move first-last-string doc page to extensions-contrib
* Fix ObjectStrategy compare method
* Fix doc bad aggregatos type name
* Create FoldingAggregatorFactory classes to fix SegmentMetadataQuery
* Add getMaxStringBytes() method to support JSON serialization
* Fix null pointer exception at segment creation phase when the string value is null
* Control the valueSelector object class on BufferAggregators
* Perform all improvements
* Add java doc on SerializablePairLongStringSerde
* Refactor ObjectStraty compare method
* Remove unused ;
* Add aggregateCombiner unit tests. Rename BufferAggregators unit tests
* Remove unused imports
* Add license header
* Add class name to java doc class serde
* Throw exception if value is unsupported class type
* Move first-last-string extension into druid core
* Update druid core docs
* Fix null pointer exception when pair->string is null
* Add null control unit tests
* Remove unused imports
* Add first/last string folding aggregator on AggregatorsModule to support segment metadata query
* Change SerializablePairLongString to extend SerializablePair
* Change vars from public to private
* Convert vars to primitive type
* Clarify compare comment
* Change IllegalStateException to ISE
* Remove TODO comments
* Control possible null pointer exception
* Add @Nullable annotation
* Remove empty line
* Remove unused parameter type
* Improve AggregatorCombiner javadocs
* Add filterNullValues option at StringLast and StringFirst aggregators
* Add filterNullValues option at agg documentation
* Fix checkstyle
* Update header license
* Fix StringFirstAggregatorFactory.VALUE_COMPARATOR
* Fix StringFirstAggregatorCombiner
* Fix if condition at StringFirstAggregateCombiner
* Remove filterNullValues from string first/last aggregators
* Add isReset flag in FirstAggregatorCombiner
* Change Arrays.asList to Collections.singletonList
- Setting druid.request.logging.delegate has no effect.
- The provider is injected based on a type parameter & this looks to be scoped to delegate for filtered loggers
* implement materialized view
* modify code according to jihoonson's comments
* modify code according to jihoonson's comments - 2
* add documentation about materialized view
* use new HadoopTuningConfig in pr 5583
* add minDataLag and fix optimizer bug
* correct value of DEFAULT_MIN_DATA_LAG_MS
* modify code according to jihoonson's comments - 3
* use the boolean expression instead of if-else
* Anonymous authenticator that authenticates all requests and then directs them to an authorizer.
* Adding documentation
* Removed some fields from class AnonymousAuthenticator
* Updating docs
* fix freeSpacePercent in segmentCache.locations
* the check should probably test the other way around
* documentation should put the option in the right place
* examples have a superfluous backslash
* add test to verify correct behavior
* switch to Path and test with jimfs
Path allows to use different filesystems.
Jimfs provides an actual (in memory) filesystem.
This also allows more complex test scenarios.
The behavior should be unchanged by this commit.
* Revert "switch to Path and test with jimfs"
This reverts commit 8b9a418d65.
* add default caffeine cache size based on runtime Xmx or max 1GB
* update docs for caffeine cache
* fix formatting
* test caffeine size should never be less than 0
* set caffeine max default size to 1G not 1M
* fix caffeine cache tests
* This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks
Currently a config called 'maxRowsInMemory' is present which affects how much memory gets
used for indexing.If this value is not optimal for your JVM heap size, it could lead
to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might
be bad for query performance and a higher value will limit number of persists but require
more jvm heap space and could lead to OOM.
'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes
kept in memory before persisting.
* The default value is 1/3(Runtime.maxMemory())
* To maintain the current behaviour set 'maxBytesInMemory' to -1
* If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them
will be respected i.e. the first one to go above threshold will trigger persist
* Fix check style and remove a comment
* Add overlord unsecured paths to coordinator when using combined service (#5579)
* Add overlord unsecured paths to coordinator when using combined service
* PR comment
* More error reporting and stats for ingestion tasks (#5418)
* Add more indexing task status and error reporting
* PR comments, add support in AppenderatorDriverRealtimeIndexTask
* Use TaskReport instead of metrics/context
* Fix tests
* Use TaskReport uploads
* Refactor fire department metrics retrieval
* Refactor input row serde in hadoop task
* Refactor hadoop task loader names
* Truncate error message in TaskStatus, add errorMsg to task report
* PR comments
* Allow getDomain to return disjointed intervals (#5570)
* Allow getDomain to return disjointed intervals
* Indentation issues
* Adding feature thetaSketchConstant to do some set operation in PostAgg (#5551)
* Adding feature thetaSketchConstant to do some set operation in PostAggregator
* Updated review comments for PR #5551 - Adding thetaSketchConstant
* Fixed CI build issue
* Updated review comments 2 for PR #5551 - Adding thetaSketchConstant
* Fix taskDuration docs for KafkaIndexingService (#5572)
* With incremental handoff the changed line is no longer true.
* Add doc for automatic pendingSegments (#5565)
* Add missing doc for automatic pendingSegments
* address comments
* Fix indexTask to respect forceExtendableShardSpecs (#5509)
* Fix indexTask to respect forceExtendableShardSpecs
* add comments
* Deprecate spark2 profile in pom.xml (#5581)
Deprecated due to https://github.com/druid-io/druid/pull/5382
* CompressionUtils: Add support for decompressing xz, bz2, zip. (#5586)
Also switch various firehoses to the new method.
Fixes#5585.
* This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks
Currently a config called 'maxRowsInMemory' is present which affects how much memory gets
used for indexing.If this value is not optimal for your JVM heap size, it could lead
to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might
be bad for query performance and a higher value will limit number of persists but require
more jvm heap space and could lead to OOM.
'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes
kept in memory before persisting.
* The default value is 1/3(Runtime.maxMemory())
* To maintain the current behaviour set 'maxBytesInMemory' to -1
* If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them
will be respected i.e. the first one to go above threshold will trigger persist
* Address code review comments
* Fix the coding style according to druid conventions
* Add more javadocs
* Rename some variables/methods
* Other minor issues
* Address more code review comments
* Some refactoring to put defaults in IndexTaskUtils
* Added check for maxBytesInMemory in AppenderatorImpl
* Decrement bytes in abandonSegment
* Test unit test for multiple sinks in single appenderator
* Fix some merge conflicts after rebase
* Fix some style checks
* Merge conflicts
* Fix failing tests
Add back check for 0 maxBytesInMemory in OnHeapIncrementalIndex
* Address PR comments
* Put defaults for maxRows and maxBytes in TuningConfig
* Change/add javadocs
* Refactoring and renaming some variables/methods
* Fix TeamCity inspection warnings
* Added maxBytesInMemory config to HadoopTuningConfig
* Updated the docs and examples
* Added maxBytesInMemory config in docs
* Removed references to maxRowsInMemory under tuningConfig in examples
* Set maxBytesInMemory to 0 until used
Set the maxBytesInMemory to 0 if user does not set it as part of tuningConfing
and set to part of max jvm memory when ingestion task starts
* Update toString in KafkaSupervisorTuningConfig
* Use correct maxBytesInMemory value in AppenderatorImpl
* Update DEFAULT_MAX_BYTES_IN_MEMORY to 1/6 max jvm memory
Experimenting with various defaults, 1/3 jvm memory causes OOM
* Update docs to correct maxBytesInMemory default value
* Minor to rename and add comment
* Add more details in docs
* Address new PR comments
* Address PR comments
* Fix spelling typo
* Update defaultHadoopCoordinates in documentation.
To match changes applied in #5382.
* Remove a parameter with defaults from example configuration file.
If it has reasonable defaults, then why would it be in an example config file?
Also, it is yet another place that has been forgotten to be updated and will be forgotten in the future.
Also, if someone is running different hadoop version, then there's much more work to be done than just changing this property, so why give users false hopes?
* Fix typo in documentation.
* Add task action metrics, add taskId metric dimension.
Adds two new metrics: task/action/log/time and task/action/run/time. Also
adds taskId as a dimension, to give us the ability to drill down into metrics
for an individual task. Also standardizes metrics-attachment using two helper
methods in IndexTaskUtils.
* Fix typo
* Use mergeBuffer instead of processingBuffer in parallelCombiner
* Fix test
* address comments
* fix test
* Fix test
* Update comment
* address comments
* fix build
* Fix test failure