druid

History

Gian Merlino 5b6727f319 Enable vectorized virtual column processing by default. (#12520 ) In the majority of cases, this improves performance. There's only one case I'm aware of where this may be a net negative: for time_floor(__time, <period>) where there are many repeated __time values. In nonvectorized processing, SingleLongInputCachingExpressionColumnValueSelector implements an optimization to avoid computing the time_floor function on every row. There is no such optimization in vectorized processing. IMO, we shouldn't mention this in the docs. Rationale: It's too fiddly of a thing: it's not guaranteed that nonvectorized processing will be faster due to the optimization, because it would have to overcome the inherent speed advantage of vectorization. So it'd always require testing to determine the best setting for a specific dataset. It would be bad if users disabled vectorization thinking it would speed up their queries, and it actually slowed them down. And even if users do their own testing, at some point in the future we'll implement the optimization for vectorized processing too, and it's likely that users that explicitly disabled vectorization will continue to have it disabled. I'd like to avoid this outcome by encouraging all users to enable vectorization at all times. Really advanced users would be following development activity anyway, and can read this issue	2022-05-16 15:43:53 +05:30
..
src	Enable vectorized virtual column processing by default. (#12520 )	2022-05-16 15:43:53 +05:30
pom.xml	Add IPAddress java library as dependency and migrate IPv4 functions to use the new library. (#11634 )	2022-05-11 22:06:20 -07:00

Enable vectorized virtual column processing by default. (#12520 )

In the majority of cases, this improves performance.

There's only one case I'm aware of where this may be a net negative: for time_floor(__time, <period>) where there are many repeated __time values. In nonvectorized processing, SingleLongInputCachingExpressionColumnValueSelector implements an optimization to avoid computing the time_floor function on every row. There is no such optimization in vectorized processing.

IMO, we shouldn't mention this in the docs. Rationale: It's too fiddly of a thing: it's not guaranteed that nonvectorized processing will be faster due to the optimization, because it would have to overcome the inherent speed advantage of vectorization. So it'd always require testing to determine the best setting for a specific dataset. It would be bad if users disabled vectorization thinking it would speed up their queries, and it actually slowed them down. And even if users do their own testing, at some point in the future we'll implement the optimization for vectorized processing too, and it's likely that users that explicitly disabled vectorization will continue to have it disabled. I'd like to avoid this outcome by encouraging all users to enable vectorization at all times. Really advanced users would be following development activity anyway, and can read this issue

2022-05-16 15:43:53 +05:30

src

Enable vectorized virtual column processing by default. (#12520 )

2022-05-16 15:43:53 +05:30

pom.xml

Add IPAddress java library as dependency and migrate IPv4 functions to use the new library. (#11634 )

2022-05-11 22:06:20 -07:00