SQL: Add is_active to sys.segments, update examples and docs. (#11550)

* SQL: Add is_active to sys.segments, update examples and docs.

is_active is short for:

  (is_published = 1 AND is_overshadowed = 0) OR is_realtime = 1

It's important because this represents "all the segments that should
be queryable, whether or not they actually are right now". Most of the
time, this is the set of segments that people will want to look at.

The web console already adds this filter to a lot of its queries,
proving its usefulness.

This patch also reworks the caveat at the bottom of the sys.segments
section, so its information is mixed into the description of each result
field. This should make it more likely for people to see the information.

* Wording updates.

* Adjustments for spellcheck.

* Adjust IT.
This commit is contained in:
Gian Merlino 2022-05-19 14:23:28 -07:00 committed by GitHub
parent 90531fd53f
commit 65a1375b67
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
5 changed files with 33 additions and 21 deletions

View File

@ -127,20 +127,22 @@ Segments table provides details on all Druid segments, whether they are publishe
|version|STRING|Version string (generally an ISO8601 timestamp corresponding to when the segment set was first started). Higher version means the more recently created segment. Version comparing is based on string comparison.| |version|STRING|Version string (generally an ISO8601 timestamp corresponding to when the segment set was first started). Higher version means the more recently created segment. Version comparing is based on string comparison.|
|partition_num|LONG|Partition number (an integer, unique within a datasource+interval+version; may not necessarily be contiguous)| |partition_num|LONG|Partition number (an integer, unique within a datasource+interval+version; may not necessarily be contiguous)|
|num_replicas|LONG|Number of replicas of this segment currently being served| |num_replicas|LONG|Number of replicas of this segment currently being served|
|num_rows|LONG|Number of rows in current segment, this value could be null if unknown to Broker at query time| |num_rows|LONG|Number of rows in this segment, or zero if the number of rows is not known.<br /><br />This row count is gathered by the Broker in the background. It will be zero if the Broker has not gathered a row count for this segment yet. For segments ingested from streams, the reported row count may lag behind the result of a `count(*)` query because the cached `num_rows` on the Broker may be out of date. This will settle shortly after new rows stop being written to that particular segment.|
|is_published|LONG|Boolean is represented as long type where 1 = true, 0 = false. 1 represents this segment has been published to the metadata store with `used=1`. See the [Architecture page](../design/architecture.md#segment-lifecycle) for more details.| |is_active|LONG|True for segments that represent the latest state of a datasource.<br /><br />Equivalent to `(is_published = 1 AND is_overshadowed = 0) OR is_realtime = 1`. In steady state, when no ingestion or data management operations are happening, `is_active` will be equivalent to `is_available`. However, they may differ from each other when ingestion or data management operations have executed recently. In these cases, Druid will load and unload segments appropriately to bring actual availability in line with the expected state given by `is_active`.|
|is_available|LONG|Boolean is represented as long type where 1 = true, 0 = false. 1 if this segment is currently being served by any process(Historical or realtime). See the [Architecture page](../design/architecture.md#segment-lifecycle) for more details.| |is_published|LONG|Boolean represented as long type where 1 = true, 0 = false. 1 if this segment has been published to the metadata store and is marked as used. See the [segment lifecycle documentation](../design/architecture.md#segment-lifecycle) for more details.|
|is_realtime|LONG|Boolean is represented as long type where 1 = true, 0 = false. 1 if this segment is _only_ served by realtime tasks, and 0 if any historical process is serving this segment.| |is_available|LONG|Boolean represented as long type where 1 = true, 0 = false. 1 if this segment is currently being served by any data serving process, like a Historical or a realtime ingestion task. See the [segment lifecycle documentation](../design/architecture.md#segment-lifecycle) for more details.|
|is_overshadowed|LONG|Boolean is represented as long type where 1 = true, 0 = false. 1 if this segment is published and is _fully_ overshadowed by some other published segments. Currently, is_overshadowed is always false for unpublished segments, although this may change in the future. You can filter for segments that "should be published" by filtering for `is_published = 1 AND is_overshadowed = 0`. Segments can briefly be both published and overshadowed if they were recently replaced, but have not been unpublished yet. See the [Architecture page](../design/architecture.md#segment-lifecycle) for more details.| |is_realtime|LONG|Boolean represented as long type where 1 = true, 0 = false. 1 if this segment is _only_ served by realtime tasks, and 0 if any Historical process is serving this segment.|
|shard_spec|STRING|JSON-serialized form of the segment `ShardSpec`| |is_overshadowed|LONG|Boolean represented as long type where 1 = true, 0 = false. 1 if this segment is published and is _fully_ overshadowed by some other published segments. Currently, `is_overshadowed` is always 0 for unpublished segments, although this may change in the future. You can filter for segments that "should be published" by filtering for `is_published = 1 AND is_overshadowed = 0`. Segments can briefly be both published and overshadowed if they were recently replaced, but have not been unpublished yet. See the [segment lifecycle documentation](../design/architecture.md#segment-lifecycle) for more details.||shard_spec|STRING|JSON-serialized form of the segment `ShardSpec`|
|dimensions|STRING|JSON-serialized form of the segment dimensions| |dimensions|STRING|JSON-serialized form of the segment dimensions|
|metrics|STRING|JSON-serialized form of the segment metrics| |metrics|STRING|JSON-serialized form of the segment metrics|
|last_compaction_state|STRING|JSON-serialized form of the compaction task's config (compaction task which created this segment). May be null if segment was not created by compaction task.| |last_compaction_state|STRING|JSON-serialized form of the compaction task's config (compaction task which created this segment). May be null if segment was not created by compaction task.|
For example to retrieve all segments for datasource "wikipedia", use the query: For example, to retrieve all currently active segments for datasource "wikipedia", use the query:
```sql ```sql
SELECT * FROM sys.segments WHERE datasource = 'wikipedia' SELECT * FROM sys.segments
WHERE datasource = 'wikipedia'
AND is_active = 1
``` ```
Another example to retrieve segments total_size, avg_size, avg_num_rows and num_segments per datasource: Another example to retrieve segments total_size, avg_size, avg_num_rows and num_segments per datasource:
@ -153,6 +155,7 @@ SELECT
CASE WHEN SUM(num_rows) = 0 THEN 0 ELSE SUM("num_rows") / (COUNT(*) FILTER(WHERE num_rows > 0)) END AS avg_num_rows, CASE WHEN SUM(num_rows) = 0 THEN 0 ELSE SUM("num_rows") / (COUNT(*) FILTER(WHERE num_rows > 0)) END AS avg_num_rows,
COUNT(*) AS num_segments COUNT(*) AS num_segments
FROM sys.segments FROM sys.segments
WHERE is_active = 1
GROUP BY 1 GROUP BY 1
ORDER BY 2 DESC ORDER BY 2 DESC
``` ```
@ -180,17 +183,15 @@ ORDER BY 1
If you want to retrieve segment that was compacted (ANY compaction): If you want to retrieve segment that was compacted (ANY compaction):
```sql ```sql
SELECT * FROM sys.segments WHERE last_compaction_state is not null SELECT * FROM sys.segments WHERE is_active = 1 AND last_compaction_state IS NOT NULL
``` ```
or if you want to retrieve segment that was compacted only by a particular compaction spec (such as that of the auto compaction): or if you want to retrieve segment that was compacted only by a particular compaction spec (such as that of the auto compaction):
```sql ```sql
SELECT * FROM sys.segments WHERE last_compaction_state == 'SELECT * FROM sys.segments where last_compaction_state = 'CompactionState{partitionsSpec=DynamicPartitionsSpec{maxRowsPerSegment=5000000, maxTotalRows=9223372036854775807}, indexSpec={bitmap={type=roaring, compressRunOnSerialization=true}, dimensionCompression=lz4, metricCompression=lz4, longEncoding=longs, segmentLoader=null}}' SELECT * FROM sys.segments WHERE is_active = 1 AND last_compaction_state = 'CompactionState{partitionsSpec=DynamicPartitionsSpec{maxRowsPerSegment=5000000, maxTotalRows=9223372036854775807}, indexSpec={bitmap={type=roaring, compressRunOnSerialization=true}, dimensionCompression=lz4, metricCompression=lz4, longEncoding=longs, segmentLoader=null}}'
``` ```
*Caveat:* Note that a segment can be served by more than one stream ingestion tasks or Historical processes, in that case it would have multiple replicas. These replicas are weakly consistent with each other when served by multiple ingestion tasks, until a segment is eventually served by a Historical, at that point the segment is immutable. Broker prefers to query a segment from Historical over an ingestion task. But if a segment has multiple realtime replicas, for e.g.. Kafka index tasks, and one task is slower than other, then the sys.segments query results can vary for the duration of the tasks because only one of the ingestion tasks is queried by the Broker and it is not guaranteed that the same task gets picked every time. The `num_rows` column of segments table can have inconsistent values during this period. There is an open [issue](https://github.com/apache/druid/issues/5915) about this inconsistency with stream ingestion tasks.
### SERVERS table ### SERVERS table
Servers table lists all discovered servers in the cluster. Servers table lists all discovered servers in the cluster.

View File

@ -9,6 +9,7 @@
"partition_num": 0, "partition_num": 0,
"num_replicas": 1, "num_replicas": 1,
"num_rows": 4462111, "num_rows": 4462111,
"is_active": 1,
"is_published": 1, "is_published": 1,
"is_available": 1, "is_available": 1,
"is_realtime": 0, "is_realtime": 0,

View File

@ -123,6 +123,8 @@ public class SystemSchema extends AbstractSchema
* where 1 = true and 0 = false to make it easy to count number of segments * where 1 = true and 0 = false to make it easy to count number of segments
* which are published, available etc. * which are published, available etc.
*/ */
private static final long IS_ACTIVE_FALSE = 0L;
private static final long IS_ACTIVE_TRUE = 1L;
private static final long IS_PUBLISHED_FALSE = 0L; private static final long IS_PUBLISHED_FALSE = 0L;
private static final long IS_PUBLISHED_TRUE = 1L; private static final long IS_PUBLISHED_TRUE = 1L;
private static final long IS_AVAILABLE_TRUE = 1L; private static final long IS_AVAILABLE_TRUE = 1L;
@ -140,6 +142,7 @@ public class SystemSchema extends AbstractSchema
.add("partition_num", ColumnType.LONG) .add("partition_num", ColumnType.LONG)
.add("num_replicas", ColumnType.LONG) .add("num_replicas", ColumnType.LONG)
.add("num_rows", ColumnType.LONG) .add("num_rows", ColumnType.LONG)
.add("is_active", ColumnType.LONG)
.add("is_published", ColumnType.LONG) .add("is_published", ColumnType.LONG)
.add("is_available", ColumnType.LONG) .add("is_available", ColumnType.LONG)
.add("is_realtime", ColumnType.LONG) .add("is_realtime", ColumnType.LONG)
@ -313,7 +316,10 @@ public class SystemSchema extends AbstractSchema
(long) segment.getShardSpec().getPartitionNum(), (long) segment.getShardSpec().getPartitionNum(),
numReplicas, numReplicas,
numRows, numRows,
IS_PUBLISHED_TRUE, //is_published is true for published segments //is_active is true for published segments that are not overshadowed
val.isOvershadowed() ? IS_ACTIVE_FALSE : IS_ACTIVE_TRUE,
//is_published is true for published segments
IS_PUBLISHED_TRUE,
isAvailable, isAvailable,
isRealtime, isRealtime,
val.isOvershadowed() ? IS_OVERSHADOWED_TRUE : IS_OVERSHADOWED_FALSE, val.isOvershadowed() ? IS_OVERSHADOWED_TRUE : IS_OVERSHADOWED_FALSE,
@ -350,8 +356,10 @@ public class SystemSchema extends AbstractSchema
(long) val.getValue().getSegment().getShardSpec().getPartitionNum(), (long) val.getValue().getSegment().getShardSpec().getPartitionNum(),
numReplicas, numReplicas,
val.getValue().getNumRows(), val.getValue().getNumRows(),
IS_PUBLISHED_FALSE, // is_active is true for unpublished segments iff they are realtime
val.getValue().isRealtime() /* is_active */,
// is_published is false for unpublished segments // is_published is false for unpublished segments
IS_PUBLISHED_FALSE,
// is_available is assumed to be always true for segments announced by historicals or realtime tasks // is_available is assumed to be always true for segments announced by historicals or realtime tasks
IS_AVAILABLE_TRUE, IS_AVAILABLE_TRUE,
val.getValue().isRealtime(), val.getValue().isRealtime(),

View File

@ -520,7 +520,7 @@ public class SystemSchemaTest extends CalciteTestBase
final RelDataType rowType = segmentsTable.getRowType(new JavaTypeFactoryImpl()); final RelDataType rowType = segmentsTable.getRowType(new JavaTypeFactoryImpl());
final List<RelDataTypeField> fields = rowType.getFieldList(); final List<RelDataTypeField> fields = rowType.getFieldList();
Assert.assertEquals(17, fields.size()); Assert.assertEquals(18, fields.size());
final SystemSchema.TasksTable tasksTable = (SystemSchema.TasksTable) schema.getTableMap().get("tasks"); final SystemSchema.TasksTable tasksTable = (SystemSchema.TasksTable) schema.getTableMap().get("tasks");
final RelDataType sysRowType = tasksTable.getRowType(new JavaTypeFactoryImpl()); final RelDataType sysRowType = tasksTable.getRowType(new JavaTypeFactoryImpl());
@ -708,14 +708,15 @@ public class SystemSchemaTest extends CalciteTestBase
Assert.assertEquals(partitionNum, row[6]); Assert.assertEquals(partitionNum, row[6]);
Assert.assertEquals(numReplicas, row[7]); Assert.assertEquals(numReplicas, row[7]);
Assert.assertEquals(numRows, row[8]); Assert.assertEquals(numRows, row[8]);
Assert.assertEquals(isPublished, row[9]); Assert.assertEquals((((isPublished == 1) && (isOvershadowed == 0)) || (isRealtime == 1)) ? 1L : 0L, row[9]);
Assert.assertEquals(isAvailable, row[10]); Assert.assertEquals(isPublished, row[10]);
Assert.assertEquals(isRealtime, row[11]); Assert.assertEquals(isAvailable, row[11]);
Assert.assertEquals(isOvershadowed, row[12]); Assert.assertEquals(isRealtime, row[12]);
Assert.assertEquals(isOvershadowed, row[13]);
if (compactionState == null) { if (compactionState == null) {
Assert.assertNull(row[16]); Assert.assertNull(row[17]);
} else { } else {
Assert.assertEquals(mapper.writeValueAsString(compactionState), row[16]); Assert.assertEquals(mapper.writeValueAsString(compactionState), row[17]);
} }
} }

View File

@ -533,6 +533,7 @@ error_msg
exprs exprs
group_id group_id
interval_expr interval_expr
is_active
is_available is_available
is_leader is_leader
is_overshadowed is_overshadowed