mirror of https://github.com/apache/druid.git
Add warning comments to Granularity.getIterable. (#13888)
This function is notorious for causing memory exhaustion and excessive CPU usage; so much so that it was valuable to work around it in the SQL planner in #13206. Hopefully, a warning comment will encourage developers to stay away and come up with solutions that do not involve computing all possible buckets.
This commit is contained in:
parent
38b6373bf7
commit
fcfb7b8ff6
|
@ -209,6 +209,19 @@ public abstract class Granularity implements Cacheable
|
||||||
return vals;
|
return vals;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return an iterable of granular buckets that overlap a particular interval.
|
||||||
|
*
|
||||||
|
* In cases where the number of granular buckets is very large, the Iterable returned by this method will take
|
||||||
|
* an excessive amount of time to compute, and materializing it into a collection will take an excessive amount
|
||||||
|
* of memory. For example, this happens in the extreme case of an input interval of
|
||||||
|
* {@link org.apache.druid.java.util.common.Intervals#ETERNITY} and any granularity other than
|
||||||
|
* {@link Granularities#ALL}, as well as cases like an input interval of ten years with {@link Granularities#SECOND}.
|
||||||
|
*
|
||||||
|
* To avoid issues stemming from large numbers of buckets, this method should be avoided, and code that uses
|
||||||
|
* this method should be rewritten to use some other approach. For example: rather than computing all possible
|
||||||
|
* buckets in a wide time range, only process buckets related to actual data points that appear.
|
||||||
|
*/
|
||||||
public Iterable<Interval> getIterable(final Interval input)
|
public Iterable<Interval> getIterable(final Interval input)
|
||||||
{
|
{
|
||||||
return new IntervalIterable(input);
|
return new IntervalIterable(input);
|
||||||
|
|
|
@ -871,6 +871,8 @@ public class DruidQuery
|
||||||
* <p>
|
* <p>
|
||||||
* Necessary because some combinations are unsafe, mainly because they would lead to the creation of too many
|
* Necessary because some combinations are unsafe, mainly because they would lead to the creation of too many
|
||||||
* time-granular buckets during query processing.
|
* time-granular buckets during query processing.
|
||||||
|
*
|
||||||
|
* @see Granularity#getIterable(Interval) the problematic method call we are trying to avoid
|
||||||
*/
|
*/
|
||||||
private static boolean canUseQueryGranularity(
|
private static boolean canUseQueryGranularity(
|
||||||
final DataSource dataSource,
|
final DataSource dataSource,
|
||||||
|
|
Loading…
Reference in New Issue