druid/docs/ingestion
Clint Wylie f8301a314f
generic block compressed complex columns (#16863)
changes:
* Adds new `CompressedComplexColumn`, `CompressedComplexColumnSerializer`, `CompressedComplexColumnSupplier` based on `CompressedVariableSizedBlobColumn` used by JSON columns
* Adds `IndexSpec.complexMetricCompression` which can be used to specify compression for the generic compressed complex column. Defaults to uncompressed because compressed columns are not backwards compatible.
* Adds new definition of `ComplexMetricSerde.getSerializer` which accepts an `IndexSpec` argument when creating a serializer. The old signature has been marked `@Deprecated` and has a default implementation that returns `null`, but it will be used by the default implementation of the new version if it is implemented to return a non-null value. The default implementation of the new method will use a `CompressedComplexColumnSerializer` if `IndexSpec.complexMetricCompression` is not null/none/uncompressed, or will use `LargeColumnSupportedComplexColumnSerializer` otherwise.
* Removed all duplicate generic implementations of `ComplexMetricSerde.getSerializer` and `ComplexMetricSerde.deserializeColumn` into default implementations `ComplexMetricSerde` instead of being copied all over the place. The default implementation of `deserializeColumn` will check if the first byte indicates that the new compression was used, otherwise will use the `GenericIndexed` based supplier.
* Complex columns with custom serializers/deserializers are unaffected and may continue doing whatever it is they do, either with specialized compression or whatever else, this new stuff is just to provide generic implementations built around `ObjectStrategy`.
* add ObjectStrategy.readRetainsBufferReference so CompressedComplexColumn only copies on read if required
* add copyValueOnRead flag down to CompressedBlockReader to avoid buffer duplicate if the value needs copied anyway
2024-08-27 00:34:41 -07:00
..
concurrent-append-replace.md Docs: Update list of ingestion types that support concurrent append and replace (#16852) 2024-08-08 08:06:22 +05:30
data-formats.md Kinesis input format docs (#16840) 2024-08-06 18:53:10 -04:00
faq.md API reference refactor (#14372) 2023-06-26 15:48:54 -07:00
hadoop.md docs: Update Azure extension (#16585) 2024-06-20 09:31:29 -07:00
index.md [Docs] Refactor streaming ingestion section (#15591) 2024-02-12 13:52:42 -08:00
ingestion-spec.md generic block compressed complex columns (#16863) 2024-08-27 00:34:41 -07:00
input-sources.md docs: Update Azure extension (#16585) 2024-06-20 09:31:29 -07:00
kafka-ingestion.md Remove references to chatAsync (#16950) 2024-08-23 13:21:07 +05:30
kinesis-ingestion.md Kinesis input format docs (#16840) 2024-08-06 18:53:10 -04:00
native-batch-firehose.md remove Firehose and FirehoseFactory (#16758) 2024-07-19 14:37:21 -07:00
native-batch-simple-task.md Docs: Change single-dim to hashed in example for index task (#15529) 2024-02-26 09:16:10 +05:30
native-batch.md docs: Update Azure extension (#16585) 2024-06-20 09:31:29 -07:00
partitioning.md Segments sorted by non-time columns. (#16849) 2024-08-23 08:24:43 -07:00
rollup.md [Docs] Refactor streaming ingestion section (#15591) 2024-02-12 13:52:42 -08:00
schema-design.md fix array presenting columns to not match single element arrays to scalars for equality (#15503) 2023-12-08 01:22:07 -08:00
schema-model.md Update Ingestion section (#14023) 2023-05-19 09:42:27 -07:00
standalone-realtime.md [Docs] Refactor streaming ingestion section (#15591) 2024-02-12 13:52:42 -08:00
streaming.md [Docs] Refactor streaming ingestion section (#15591) 2024-02-12 13:52:42 -08:00
supervisor.md Web console: better sql data loader reset (#16696) 2024-07-11 14:45:04 -07:00
tasks.md Track IngestionState more accurately in realtime tasks. (#16934) 2024-08-22 11:43:46 +05:30
tranquility.md Remove index_realtime and index_realtime_appenderator tasks (#16602) 2024-06-24 20:13:33 -07:00