druid

mirror of https://github.com/apache/druid.git synced 2025-02-11 04:24:58 +00:00

Go to file

Surekha 13c616ba24 'maxBytesInMemory' tuningConfig introduced for ingestion tasks (#5583 )

* This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks

Currently a config called 'maxRowsInMemory' is present which affects how much memory gets
used for indexing.If this value is not optimal for your JVM heap size, it could lead
to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might
be bad for query performance and a higher value will limit number of persists but require
more jvm heap space and could lead to OOM.
'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes
kept in memory before persisting.

 * The default value is 1/3(Runtime.maxMemory())
 * To maintain the current behaviour set 'maxBytesInMemory' to -1
 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them
   will be respected i.e. the first one to go above threshold will trigger persist

* Fix check style and remove a comment

* Add overlord unsecured paths to coordinator when using combined service (#5579)

* Add overlord unsecured paths to coordinator when using combined service

* PR comment

* More error reporting and stats for ingestion tasks (#5418)

* Add more indexing task status and error reporting

* PR comments, add support in AppenderatorDriverRealtimeIndexTask

* Use TaskReport instead of metrics/context

* Fix tests

* Use TaskReport uploads

* Refactor fire department metrics retrieval

* Refactor input row serde in hadoop task

* Refactor hadoop task loader names

* Truncate error message in TaskStatus, add errorMsg to task report

* PR comments

* Allow getDomain to return disjointed intervals (#5570)

* Allow getDomain to return disjointed intervals

* Indentation issues

* Adding feature thetaSketchConstant to do some set operation in PostAgg (#5551)

* Adding feature thetaSketchConstant to do some set operation in PostAggregator

* Updated review comments for PR #5551 - Adding thetaSketchConstant

* Fixed CI build issue

* Updated review comments 2 for PR #5551 - Adding thetaSketchConstant

* Fix taskDuration docs for KafkaIndexingService (#5572)

* With incremental handoff the changed line is no longer true.

* Add doc for automatic pendingSegments (#5565)

* Add missing doc for automatic pendingSegments

* address comments

* Fix indexTask to respect forceExtendableShardSpecs (#5509)

* Fix indexTask to respect forceExtendableShardSpecs

* add comments

* Deprecate spark2 profile in pom.xml (#5581)

Deprecated due to https://github.com/druid-io/druid/pull/5382

* CompressionUtils: Add support for decompressing xz, bz2, zip. (#5586)

Also switch various firehoses to the new method.

Fixes #5585.

* This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks

Currently a config called 'maxRowsInMemory' is present which affects how much memory gets
used for indexing.If this value is not optimal for your JVM heap size, it could lead
to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might
be bad for query performance and a higher value will limit number of persists but require
more jvm heap space and could lead to OOM.
'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes
kept in memory before persisting.

 * The default value is 1/3(Runtime.maxMemory())
 * To maintain the current behaviour set 'maxBytesInMemory' to -1
 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them
   will be respected i.e. the first one to go above threshold will trigger persist

* Address code review comments

* Fix the coding style according to druid conventions
* Add more javadocs
* Rename some variables/methods
* Other minor issues

* Address more code review comments

* Some refactoring to put defaults in IndexTaskUtils
* Added check for maxBytesInMemory in AppenderatorImpl
* Decrement bytes in abandonSegment
* Test unit test for multiple sinks in single appenderator
* Fix some merge conflicts after rebase

* Fix some style checks

* Merge conflicts

* Fix failing tests

Add back check for 0 maxBytesInMemory in OnHeapIncrementalIndex

* Address PR comments

* Put defaults for maxRows and maxBytes in TuningConfig
* Change/add javadocs
* Refactoring and renaming some variables/methods

* Fix TeamCity inspection warnings

* Added maxBytesInMemory config to HadoopTuningConfig

* Updated the docs and examples

* Added maxBytesInMemory config in docs
* Removed references to maxRowsInMemory under tuningConfig in examples

* Set maxBytesInMemory to 0 until used

Set the maxBytesInMemory to 0 if user does not set it as part of tuningConfing
and set to part of max jvm memory when ingestion task starts

* Update toString in KafkaSupervisorTuningConfig

* Use correct maxBytesInMemory value in AppenderatorImpl

* Update DEFAULT_MAX_BYTES_IN_MEMORY to 1/6 max jvm memory

Experimenting with various defaults, 1/3 jvm memory causes OOM

* Update docs to correct maxBytesInMemory default value

* Minor to rename and add comment

* Add more details in docs

* Address new PR comments

* Address PR comments

* Fix spelling typo

2018-05-03 16:25:58 -07:00

.idea

Remove unused code and exception declarations (#5461 )

2018-03-16 22:11:12 +01:00

api

Use unique segment paths for Kafka indexing (#5692 )

2018-04-29 21:59:48 -07:00

aws-common

Support enablePathStyleAccess, disableChunkedEncoding, and forceGlobalBucketAccessEnabled for aws client (#5702 )

2018-05-02 10:45:38 -07:00

benchmarks

Refactor index merging, replace Rowboats with RowIterators and RowPointers (#5335 )

2018-04-27 17:34:32 -07:00

Add TeamCity instructions (#5379 )

2018-02-10 13:13:33 -08:00

codestyle

Add GenericWhitespace checkstyle check (#5668 )

2018-04-24 01:09:14 +05:30

common

Use mergeBuffer instead of processingBuffer in parallelCombiner (#5634 )

2018-04-27 18:14:37 -07:00

distribution

Opentsdb emitter extension (#5380 )

2018-02-13 13:10:22 -08:00

docs

'maxBytesInMemory' tuningConfig introduced for ingestion tasks (#5583 )

2018-05-03 16:25:58 -07:00

examples

'maxBytesInMemory' tuningConfig introduced for ingestion tasks (#5583 )

2018-05-03 16:25:58 -07:00

extendedset

Remove unused code and exception declarations (#5461 )

2018-03-16 22:11:12 +01:00

extensions-contrib

'maxBytesInMemory' tuningConfig introduced for ingestion tasks (#5583 )

2018-05-03 16:25:58 -07:00

extensions-core

'maxBytesInMemory' tuningConfig introduced for ingestion tasks (#5583 )

2018-05-03 16:25:58 -07:00

hll

Remove unused code and exception declarations (#5461 )

2018-03-16 22:11:12 +01:00

indexing-hadoop

'maxBytesInMemory' tuningConfig introduced for ingestion tasks (#5583 )

2018-05-03 16:25:58 -07:00

indexing-service

'maxBytesInMemory' tuningConfig introduced for ingestion tasks (#5583 )

2018-05-03 16:25:58 -07:00

integration-tests

Support enablePathStyleAccess, disableChunkedEncoding, and forceGlobalBucketAccessEnabled for aws client (#5702 )

2018-05-02 10:45:38 -07:00

java-util

fix NPE when buffersList contains null in SmooshedFileMapper (#5689 )

2018-04-27 18:15:04 -07:00

processing

'maxBytesInMemory' tuningConfig introduced for ingestion tasks (#5583 )

2018-05-03 16:25:58 -07:00

publications

Changes to lambda architecture paper required for HICSS (#3382 )

2016-09-06 21:32:21 -07:00

server

'maxBytesInMemory' tuningConfig introduced for ingestion tasks (#5583 )

2018-05-03 16:25:58 -07:00

services

'maxBytesInMemory' tuningConfig introduced for ingestion tasks (#5583 )

2018-05-03 16:25:58 -07:00

sql

SQL: Remove some unused code. (#5690 )

2018-04-24 11:42:16 -07:00

.gitignore

git ignore dependency-reduced-pom.xml (#4711 )

2017-08-23 10:10:50 -07:00

.travis.yml

Use the official aws-sdk instead of jet3t (#5382 )

2018-03-21 15:36:54 -07:00

CONTRIBUTING.md

Replace dev list references in docs. (#5723 )

2018-04-30 11:25:45 -07:00

druid_intellij_formatting.xml

Make formatting IntelliJ 2016 friendly (#2978 )

2016-05-18 12:42:21 -07:00

DruidCorporateCLA.pdf

fix CLA email / mailing address

2014-04-17 15:26:28 -07:00

DruidIndividualCLA.pdf

fix CLA email / mailing address

2014-04-17 15:26:28 -07:00

eclipse_formatting.xml

Merge pull request #2905 from javasoze/eclipse_formatting

2016-04-29 18:42:03 -07:00

eclipse.importorder

Merge pull request #2905 from javasoze/eclipse_formatting

2016-04-29 18:42:03 -07:00

INTELLIJ_SETUP.md

Prohibit and remove unused declarations in the processing module (#4930 )

2017-11-09 09:27:27 -08:00

intellij-sdk-config.jpg

Prohibit and remove unused declarations in the processing module (#4930 )

2017-11-09 09:27:27 -08:00

LICENSE

Clean up README and license

2015-02-18 23:09:28 -08:00

NOTICE

Extension points for authentication/authorization (#4271 )

2017-09-15 23:45:48 -07:00

pom.xml

CompressionUtils: Add support for decompressing xz, bz2, zip. (#5586 )

2018-04-06 08:06:45 -07:00

README.md

Replace dev list references in docs. (#5723 )

2018-04-30 11:25:45 -07:00

upload.sh

upload.sh: Use awscli if s3cmd is not available. (#3114 )

2016-06-08 17:01:46 -07:00

README.md

Druid

Druid is a distributed, column-oriented, real-time analytics data store that is commonly used to power exploratory dashboards in multi-tenant environments.

Druid excels as a data warehousing solution for fast aggregate queries on petabyte sized data sets. Druid supports a variety of flexible filters, exact calculations, approximate algorithms, and other useful calculations.

Druid can load both streaming and batch data and integrates with Samza, Kafka, Storm, Spark, and Hadoop.

License

Apache License, Version 2.0

More Information

More information about Druid can be found on http://www.druid.io.

Documentation

You can find the documentation for the latest Druid release on the project website.

If you would like to contribute documentation, please do so under /docs/content in this repository and submit a pull request.

Getting Started

You can get started with Druid with our quickstart.

Reporting Issues

If you find any bugs, please file a GitHub issue.

Community

The Druid community is in the process of migrating to Apache by way of the Apache Incubator. Eventually, as we proceed along this path, our site will move from http://druid.io/ to https://druid.apache.org/.

Community support is available on the druid-user mailing list(druid-user@googlegroups.com), which is hosted at Google Groups.

Development discussions occur on dev@druid.apache.org, which you can subscribe to by emailing dev-subscribe@druid.apache.org.

We also have a couple people hanging out on IRC in #druid-dev on irc.freenode.net.

Contributing

Please follow the guidelines listed here.

Languages

Java 62.4%

ReScript 30.7%

TypeScript 3.1%

Euphoria 0.9%

Csound 0.8%

Other 1.9%