mirror of https://github.com/apache/druid.git
1c7a03a47b
* Lower default maxRowsInMemory for realtime ingestion. The thinking here is that for best ingestion throughput, we want intermediate persists to be as big as possible without using up all available memory. So, we rely mainly on maxBytesInMemory. The default maxRowsInMemory (1 million) is really just a safety: in case we have a large number of very small rows, we don't want to get overwhelmed by per-row overheads. However, maximum ingestion throughput isn't necessarily the primary goal for realtime ingestion. Query performance is also important. And because query performance is not as good on the in-memory dataset, it's helpful to keep it from growing too large. 150k seems like a reasonable balance here. It means that for a typical 5 million row segment, we won't trigger more than 33 persists due to this limit, which is a reasonable number of persists. * Update tests. * Update server/src/main/java/org/apache/druid/segment/indexing/RealtimeTuningConfig.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Fix test. * Fix link. --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> |
||
---|---|---|
.. | ||
avro-extensions | ||
azure-extensions | ||
datasketches | ||
druid-aws-rds-extensions | ||
druid-basic-security | ||
druid-bloom-filter | ||
druid-catalog | ||
druid-kerberos | ||
druid-pac4j | ||
druid-ranger-security | ||
ec2-extensions | ||
google-extensions | ||
hdfs-storage | ||
histogram | ||
kafka-extraction-namespace | ||
kafka-indexing-service | ||
kinesis-indexing-service | ||
kubernetes-extensions | ||
lookups-cached-global | ||
lookups-cached-single | ||
multi-stage-query | ||
mysql-metadata-storage | ||
orc-extensions | ||
parquet-extensions | ||
postgresql-metadata-storage | ||
protobuf-extensions | ||
s3-extensions | ||
simple-client-sslcontext | ||
stats | ||
testing-tools |