druid/extensions-core
Gian Merlino 1c7a03a47b
Lower default maxRowsInMemory for realtime ingestion. (#13939)
* Lower default maxRowsInMemory for realtime ingestion.

The thinking here is that for best ingestion throughput, we want
intermediate persists to be as big as possible without using up all
available memory. So, we rely mainly on maxBytesInMemory. The default
maxRowsInMemory (1 million) is really just a safety: in case we have
a large number of very small rows, we don't want to get overwhelmed
by per-row overheads.

However, maximum ingestion throughput isn't necessarily the primary
goal for realtime ingestion. Query performance is also important. And
because query performance is not as good on the in-memory dataset, it's
helpful to keep it from growing too large. 150k seems like a reasonable
balance here. It means that for a typical 5 million row segment, we
won't trigger more than 33 persists due to this limit, which is a
reasonable number of persists.

* Update tests.

* Update server/src/main/java/org/apache/druid/segment/indexing/RealtimeTuningConfig.java

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

* Fix test.

* Fix link.

---------

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2023-03-21 10:36:36 -07:00
..
avro-extensions sampler + type detection = bff (#13711) 2023-02-28 04:14:30 -08:00
azure-extensions Removes FiniteFirehoseFactory and its implementations (#12852) 2023-03-02 18:07:17 +05:30
datasketches Avoid creating new RelDataTypeFactory during SQL planning. (#13904) 2023-03-08 21:55:49 -08:00
druid-aws-rds-extensions merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
druid-basic-security merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
druid-bloom-filter Avoid creating new RelDataTypeFactory during SQL planning. (#13904) 2023-03-08 21:55:49 -08:00
druid-catalog merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
druid-kerberos merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
druid-pac4j merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
druid-ranger-security merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
ec2-extensions merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
google-extensions Removes FiniteFirehoseFactory and its implementations (#12852) 2023-03-02 18:07:17 +05:30
hdfs-storage Removes FiniteFirehoseFactory and its implementations (#12852) 2023-03-02 18:07:17 +05:30
histogram Avoid creating new RelDataTypeFactory during SQL planning. (#13904) 2023-03-08 21:55:49 -08:00
kafka-extraction-namespace merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
kafka-indexing-service Lower default maxRowsInMemory for realtime ingestion. (#13939) 2023-03-21 10:36:36 -07:00
kinesis-indexing-service Lower default maxRowsInMemory for realtime ingestion. (#13939) 2023-03-21 10:36:36 -07:00
kubernetes-extensions merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
lookups-cached-global Add ANSI_QUOTES propety to DBI init in lookups. (#13826) 2023-02-21 15:13:22 -08:00
lookups-cached-single merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
multi-stage-query Lower default maxRowsInMemory for realtime ingestion. (#13939) 2023-03-21 10:36:36 -07:00
mysql-metadata-storage merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
orc-extensions sampler + type detection = bff (#13711) 2023-02-28 04:14:30 -08:00
parquet-extensions Fixes parquet uint_32 datatype conversion (#13935) 2023-03-16 15:27:38 +05:30
postgresql-metadata-storage merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
protobuf-extensions sampler + type detection = bff (#13711) 2023-02-28 04:14:30 -08:00
s3-extensions Fix durable storage cleanup (#13853) 2023-03-06 09:49:14 +05:30
simple-client-sslcontext merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
stats Avoid creating new RelDataTypeFactory during SQL planning. (#13904) 2023-03-08 21:55:49 -08:00
testing-tools merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00