diff --git a/docs/content/ingestion/batch-ingestion.md b/docs/content/ingestion/batch-ingestion.md index 5118a41aec0..63cc2eb02dd 100644 --- a/docs/content/ingestion/batch-ingestion.md +++ b/docs/content/ingestion/batch-ingestion.md @@ -162,6 +162,7 @@ The tuningConfig is optional and default parameters will be used if no tuningCon |combineText|Boolean|Use CombineTextInputFormat to combine multiple files into a file split. This can speed up Hadoop jobs when processing a large number of small files.|no (default == false)| |useCombiner|Boolean|Use Hadoop combiner to merge rows at mapper if possible.|no (default == false)| |jobProperties|Object|A map of properties to add to the Hadoop job configuration, see below for details.|no (default == null)| +|indexSpec|Object|Tune how data is indexed. See below for more information.|no| |buildV9Directly|Boolean|Build v9 index directly instead of building v8 index and converting it to v9 format.|no (default = false)| |numBackgroundPersistThreads|Integer|The number of new background threads to use for incremental persists. Using this feature causes a notable increase in memory pressure and cpu usage but will make the job finish more quickly. If changing from the default of 0 (use current thread for persists), we recommend setting it to 1.|no (default == 0)| @@ -186,6 +187,14 @@ The following properties can be used to tune how the MapReduce job is configured **Please note that using `mapreduce.job.user.classpath.first` is an expert feature and should not be used without a deep understanding of Hadoop and Java class loading mechanism.** +#### IndexSpec + +|Field|Type|Description|Required| +|-----|----|-----------|--------| +|bitmap|String|The type of bitmap index to create. Choose from `roaring` or `concise`, or null to use the default (`concise`).|No| +|dimensionCompression|String|Compression format for dimension columns. Choose from `LZ4`, `LZF`, or `uncompressed`. The default is `LZ4`.|No| +|metricCompression|String|Compression format for metric columns. Choose from `LZ4`, `LZF`, or `uncompressed`. The default is `LZ4`.|No| + ### Partitioning specification Segments are always partitioned based on timestamp (according to the granularitySpec) and may be further partitioned in diff --git a/docs/content/ingestion/stream-pull.md b/docs/content/ingestion/stream-pull.md index c2d2fb125d3..2fe078b57d7 100644 --- a/docs/content/ingestion/stream-pull.md +++ b/docs/content/ingestion/stream-pull.md @@ -93,7 +93,7 @@ The property `druid.realtime.specFile` has the path of a file (absolute or relat }, "tuningConfig": { "type" : "realtime", - "maxRowsInMemory": 500000, + "maxRowsInMemory": 75000, "intermediatePersistPeriod": "PT10m", "windowPeriod": "PT10m", "basePersistDirectory": "\/tmp\/realtime\/basePersist", @@ -155,6 +155,7 @@ The tuningConfig is optional and default parameters will be used if no tuningCon |mergeThreadPriority|int|If `-XX:+UseThreadPriorities` is properly enabled, this will set the thread priority of the merging thread to `Thread.NORM_PRIORITY` plus this value within the bounds of `Thread.MIN_PRIORITY` and `Thread.MAX_PRIORITY`. A value of 0 indicates to not change the thread priority.|no (default = 0; inherit and do not override)| |reportParseExceptions|Boolean|If true, exceptions encountered during parsing will be thrown and will halt ingestion. If false, unparseable rows and fields will be skipped. If an entire row is skipped, the "unparseable" counter will be incremented. If some fields in a row were parseable and some were not, the parseable fields will be indexed and the "unparseable" counter will not be incremented.|false| |handoffConditionTimeout|long|Milliseconds to wait for segment handoff. It must be >= 0 and 0 means wait forerver.|0| +|indexSpec|Object|Tune how data is indexed. See below for more information.|no| Before enabling thread priority settings, users are highly encouraged to read the [original pull request](https://github.com/druid-io/druid/pull/984) and other documentation about proper use of `-XX:+UseThreadPriorities`. @@ -166,6 +167,13 @@ The following policies are available: * `messageTime` – Can be used for non-"current time" as long as that data is relatively in sequence. Events are rejected if they are less than `windowPeriod` from the event with the latest timestamp. Hand off only occurs if an event is seen after the segmentGranularity and `windowPeriod` (hand off will not periodically occur unless you have a constant stream of data). * `none` – All events are accepted. Never hands off data unless shutdown() is called on the configured firehose. +### Index Spec + +|Field|Type|Description|Required| +|-----|----|-----------|--------| +|bitmap|String|The type of bitmap index to create. Choose from `roaring` or `concise`, or null to use the default (`concise`).|No| +|dimensionCompression|String|Compression format for dimension columns. Choose from `LZ4`, `LZF`, or `uncompressed`. The default is `LZ4`.|No| +|metricCompression|String|Compression format for metric columns. Choose from `LZ4`, `LZF`, or `uncompressed`. The default is `LZ4`.|No| #### Sharding diff --git a/processing/src/main/java/io/druid/segment/IndexSpec.java b/processing/src/main/java/io/druid/segment/IndexSpec.java index 2553c0f6f1c..7674dc97185 100644 --- a/processing/src/main/java/io/druid/segment/IndexSpec.java +++ b/processing/src/main/java/io/druid/segment/IndexSpec.java @@ -81,7 +81,8 @@ public class IndexSpec * Defaults to the bitmap type specified by the (deprecated) "druid.processing.bitmap.type" * setting, or, if none was set, uses the default @{link BitmapSerde.DefaultBitmapSerdeFactory} * - * @param dimensionCompression compression format for dimension columns. The default, null, means no compression + * @param dimensionCompression compression format for dimension columns, null to use the default + * Defaults to @{link CompressedObjectStrategy.DEFAULT_COMPRESSION_STRATEGY} * * @param metricCompression compression format for metric columns, null to use the default. * Defaults to @{link CompressedObjectStrategy.DEFAULT_COMPRESSION_STRATEGY}