From 0325f62af2f1fd7502153c48dd22f1afade669bd Mon Sep 17 00:00:00 2001 From: Charles Smith Date: Wed, 27 Nov 2024 11:41:28 -0800 Subject: [PATCH] [Docs] Remove ambiguous advice regarding TopN correctness (#17522) --- docs/querying/topnquery.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/querying/topnquery.md b/docs/querying/topnquery.md index 663ae2ce7db..25d8b224802 100644 --- a/docs/querying/topnquery.md +++ b/docs/querying/topnquery.md @@ -32,7 +32,7 @@ sidebar_label: "TopN" Apache Druid TopN queries return a sorted set of results for the values in a given dimension according to some criteria. Conceptually, they can be thought of as an approximate [GroupByQuery](../querying/groupbyquery.md) over a single dimension with an [Ordering](../querying/limitspec.md) spec. TopNs are much faster and resource efficient than GroupBys for this use case. These types of queries take a topN query object and return an array of JSON objects where each object represents a value asked for by the topN query. -TopNs are approximate in that each data process will rank their top K results and only return those top K results to the Broker. K, by default in Druid, is `max(1000, threshold)`. In practice, this means that if you ask for the top 1000 items ordered, the correctness of the first ~900 items will be 100%, and the ordering of the results after that is not guaranteed. TopNs can be made more accurate by increasing the threshold. +TopNs are approximate in that each data process will rank their top K results and only return those top K results to the Broker. K, by default in Druid, is `max(1000, threshold)`. A topN query object looks like: