mirror of https://github.com/apache/druid.git
docs: Fix broken anchor links (#15621)
This commit is contained in:
parent
e40b96e026
commit
b8060fc93f
|
@ -121,12 +121,12 @@ The following properties are automatically set by the Coordinator:
|
|||
* `id`: Generated using the task type, datasource name, interval, and timestamp. The task ID is prefixed with `coordinator-issued`.
|
||||
* `context`: Set according to the user-provided `taskContext`.
|
||||
|
||||
Compaction tasks typically fetch all [relevant segments](compaction.md#compaction-io-configuration) prior to launching any subtasks,
|
||||
Compaction tasks typically fetch all [relevant segments](manual-compaction.md#compaction-io-configuration) prior to launching any subtasks,
|
||||
_unless_ the following properties are all set to non-null values. It is strongly recommended to set them to non-null values to
|
||||
maximize performance and minimize disk usage of the `compact` tasks launched by auto-compaction:
|
||||
|
||||
- [`granularitySpec`](compaction.md#compaction-granularity-spec), with non-null values for each of `segmentGranularity`, `queryGranularity`, and `rollup`
|
||||
- [`dimensionsSpec`](compaction.md#compaction-dimensions-spec)
|
||||
- [`granularitySpec`](manual-compaction.md#compaction-granularity-spec), with non-null values for each of `segmentGranularity`, `queryGranularity`, and `rollup`
|
||||
- [`dimensionsSpec`](manual-compaction.md#compaction-dimensions-spec)
|
||||
- `metricsSpec`
|
||||
|
||||
For more details on each of the specs in an auto-compaction configuration, see [Automatic compaction dynamic configuration](../configuration/index.md#automatic-compaction-dynamic-configuration).
|
||||
|
@ -306,7 +306,8 @@ Set `taskLockType` to `REPLACE` if you're replacing data. For example, if you u
|
|||
## Learn more
|
||||
|
||||
See the following topics for more information:
|
||||
* [Compaction](compaction.md) for an overview of compaction and how to set up manual compaction in Druid.
|
||||
* [Compaction](compaction.md) for an overview of compaction in Druid.
|
||||
* [Manual compaction](manual-compaction.md) for how to manually perform compaction tasks.
|
||||
* [Segment optimization](../operations/segment-optimization.md) for guidance on evaluating and optimizing Druid segment size.
|
||||
* [Coordinator process](../design/coordinator.md#automatic-compaction) for details on how the Coordinator plans compaction tasks.
|
||||
|
||||
|
|
|
@ -74,7 +74,7 @@ For more information, see [Avoid conflicts with ingestion](../data-management/au
|
|||
|
||||
### Segment granularity handling
|
||||
|
||||
Unless you modify the segment granularity in [`granularitySpec`](#compaction-granularity-spec), Druid attempts to retain the granularity for the compacted segments. When segments have different segment granularities with no overlap in interval Druid creates a separate compaction task for each to retain the segment granularity in the compacted segment.
|
||||
Unless you modify the segment granularity in [`granularitySpec`](manual-compaction.md#compaction-granularity-spec), Druid attempts to retain the granularity for the compacted segments. When segments have different segment granularities with no overlap in interval Druid creates a separate compaction task for each to retain the segment granularity in the compacted segment.
|
||||
|
||||
If segments have different segment granularities before compaction but there is some overlap in interval, Druid attempts find start and end of the overlapping interval and uses the closest segment granularity level for the compacted segment.
|
||||
|
||||
|
@ -82,7 +82,7 @@ For example consider two overlapping segments: segment "A" for the interval 01/0
|
|||
|
||||
### Query granularity handling
|
||||
|
||||
Unless you modify the query granularity in the [`granularitySpec`](#compaction-granularity-spec), Druid retains the query granularity for the compacted segments. If segments have different query granularities before compaction, Druid chooses the finest level of granularity for the resulting compacted segment. For example if a compaction task combines two segments, one with day query granularity and one with minute query granularity, the resulting segment uses minute query granularity.
|
||||
Unless you modify the query granularity in the [`granularitySpec`](manual-compaction.md#compaction-granularity-spec), Druid retains the query granularity for the compacted segments. If segments have different query granularities before compaction, Druid chooses the finest level of granularity for the resulting compacted segment. For example if a compaction task combines two segments, one with day query granularity and one with minute query granularity, the resulting segment uses minute query granularity.
|
||||
|
||||
:::info
|
||||
In Apache Druid 0.21.0 and prior, Druid sets the granularity for compacted segments to the default granularity of `NONE` regardless of the query granularity of the original segments.
|
||||
|
@ -108,6 +108,6 @@ You can check that your segments are rolled up or not by using [Segment Metadata
|
|||
|
||||
See the following topics for more information:
|
||||
- [Segment optimization](../operations/segment-optimization.md) for guidance to determine if compaction will help in your case.
|
||||
- [Manual compaction](./manual-compaction.md) for how to run a one-time compaction task
|
||||
- [Manual compaction](./manual-compaction.md) for how to run a one-time compaction task.
|
||||
- [Automatic compaction](automatic-compaction.md) for how to enable and configure automatic compaction.
|
||||
|
||||
|
|
|
@ -37,7 +37,7 @@ A datasource may have anywhere from just a few segments, up to hundreds of thous
|
|||
- Bitmap compression for bitmap indexes
|
||||
- Type-aware compression for all columns
|
||||
|
||||
Periodically, segments are committed and published to [deep storage](#deep-storage), become immutable, and move from MiddleManagers to the Historical services. An entry about the segment is also written to the [metadata store](#metadata-storage). This entry is a self-describing bit of metadata about the segment, including things like the schema of the segment, its size, and its location on deep storage. These entries tell the Coordinator what data is available on the cluster.
|
||||
Periodically, segments are committed and published to [deep storage](deep-storage.md), become immutable, and move from MiddleManagers to the Historical services. An entry about the segment is also written to the [metadata store](metadata-storage.md). This entry is a self-describing bit of metadata about the segment, including things like the schema of the segment, its size, and its location on deep storage. These entries tell the Coordinator what data is available on the cluster.
|
||||
|
||||
For details on the segment file format, see [segment files](segments.md).
|
||||
|
||||
|
@ -94,7 +94,7 @@ The switch appears to happen instantaneously to a user, because Druid handles th
|
|||
|
||||
Each segment has a lifecycle that involves the following three major areas:
|
||||
|
||||
1. **Metadata store:** Segment metadata (a small JSON payload generally no more than a few KB) is stored in the [metadata store](../design/metadata-storage.md) once a segment is done being constructed. The act of inserting a record for a segment into the metadata store is called publishing. These metadata records have a boolean flag named `used`, which controls whether the segment is intended to be queryable or not. Segments created by realtime tasks will be
|
||||
1. **Metadata store:** Segment metadata (a small JSON payload generally no more than a few KB) is stored in the [metadata store](metadata-storage.md) once a segment is done being constructed. The act of inserting a record for a segment into the metadata store is called publishing. These metadata records have a boolean flag named `used`, which controls whether the segment is intended to be queryable or not. Segments created by realtime tasks will be
|
||||
available before they are published, since they are only published when the segment is complete and will not accept any additional rows of data.
|
||||
2. **Deep storage:** Segment data files are pushed to deep storage once a segment is done being constructed. This happens immediately before publishing metadata to the metadata store.
|
||||
3. **Availability for querying:** Segments are available for querying on some Druid data server, like a realtime task, directly from deep storage, or a Historical service.
|
||||
|
@ -114,7 +114,7 @@ Druid has an architectural separation between ingestion and querying, as describ
|
|||
|
||||
On the ingestion side, Druid's primary [ingestion methods](../ingestion/index.md#ingestion-methods) are all pull-based and offer transactional guarantees. This means that you are guaranteed that ingestion using these methods will publish in an all-or-nothing manner:
|
||||
|
||||
- Supervised "seekable-stream" ingestion methods like [Kafka](../development/extensions-core/kafka-ingestion.md) and [Kinesis](../development/extensions-core/kinesis-ingestion.md). With these methods, Druid commits stream offsets to its [metadata store](#metadata-storage) alongside segment metadata, in the same transaction. Note that ingestion of data that has not yet been published can be rolled back if ingestion tasks fail. In this case, partially-ingested data is
|
||||
- Supervised "seekable-stream" ingestion methods like [Kafka](../development/extensions-core/kafka-ingestion.md) and [Kinesis](../development/extensions-core/kinesis-ingestion.md). With these methods, Druid commits stream offsets to its [metadata store](metadata-storage.md) alongside segment metadata, in the same transaction. Note that ingestion of data that has not yet been published can be rolled back if ingestion tasks fail. In this case, partially-ingested data is
|
||||
discarded, and Druid will resume ingestion from the last committed set of stream offsets. This ensures exactly-once publishing behavior.
|
||||
- [Hadoop-based batch ingestion](../ingestion/hadoop.md). Each task publishes all segment metadata in a single transaction.
|
||||
- [Native batch ingestion](../ingestion/native-batch.md). In parallel mode, the supervisor task publishes all segment metadata in a single transaction after the subtasks are finished. In simple (single-task) mode, the single task publishes all segment metadata in a single transaction after it is complete.
|
||||
|
@ -137,4 +137,4 @@ When a time chunk is overwritten, a new core set of segments is created with a h
|
|||
Druid also supports an experimental segment locking mode that is activated by setting
|
||||
[`forceTimeChunkLock`](../ingestion/tasks.md#context) to false in the context of an ingestion task. In this case, Druid creates an atomic update group using the existing version for the time chunk, instead of creating a new core set with a new version number. There can be multiple atomic update groups with the same version number per time chunk. Each one replaces a specific set of earlier segments in the same time chunk and with the same version number. Druid will query the latest one that is fully available. This is a more powerful version of the core set concept, because it enables atomically replacing a subset of data for a time chunk, as well as doing atomic replacement and appending simultaneously.
|
||||
|
||||
If segments become unavailable due to multiple Historicals going offline simultaneously (beyond your replication factor), then Druid queries will include only the segments that are still available. In the background, Druid will reload these unavailable segments on other Historicals as quickly as possible, at which point they will be included in queries again.
|
||||
If segments become unavailable due to multiple Historicals going offline simultaneously (beyond your replication factor), then Druid queries will include only the segments that are still available. In the background, Druid will reload these unavailable segments on other Historicals as quickly as possible, at which point they will be included in queries again.
|
||||
|
|
|
@ -340,7 +340,7 @@ The following Coordinator dynamic configs have been removed:
|
|||
* `emitBalancingStats`: Stats for errors encountered while balancing will always be emitted. Other debugging stats will not be emitted but can be logged by setting the appropriate `debugDimensions`.
|
||||
* `useBatchedSegmentSampler` and `percentOfSegmentsToConsiderPerMove`: Batched segment sampling is now the standard and will always be on.
|
||||
|
||||
Use the new [smart segment loading](#smart-segment-loading) mode instead.
|
||||
Use the new [smart segment loading](https://druid.apache.org/docs/latest/configuration/#smart-segment-loading) mode instead.
|
||||
|
||||
[#14524](https://github.com/apache/druid/pull/14524)
|
||||
|
||||
|
@ -631,4 +631,4 @@ As [ZooKeeper 3.4 has been end-of-life for a while](https://mail-archives.apache
|
|||
|
||||
All columns in the `sys.segments` table are now serialized in the JSON format to make them consistent with other system tables. Column names now use the same "snake case" convention.
|
||||
|
||||
[#10481](https://github.com/apache/druid/pull/10481)
|
||||
[#10481](https://github.com/apache/druid/pull/10481)
|
||||
|
|
Loading…
Reference in New Issue