* Enable code coverage
Code coverage was disabled via
https://github.com/apache/incubator-druid/pull/3122 due to an issue with
cobertura in Travis CI. Switch code coverage tool from cobertura to
jacoco to avoid issue and re-enable coveralls for Travis CI.
* Exclude non-production code
* Exclude benchmark generated code
* Exclude DruidTestRunnerFactory
* Add TaskResourceCleaner; fix a couple of concurrency bugs in batch tasks
* kill runner when it's ready
* add comment
* kill run thread
* fix test
* Take closeable out of Appenderator
* add javadoc
* fix test
* fix test
* update javadoc
* add javadoc about killed task
* address comment
* Add support for parallel native indexing with shuffle for perfect rollup.
* Add comment about volatiles
* fix test
* fix test
* handling missing exceptions
* more clear javadoc for stopGracefully
* unused import
* update javadoc
* Add missing statement in javadoc
* address comments; fix doc
* add javadoc for isGuaranteedRollup
* Rename confusing variable name and fix typos
* fix typos; move fetch() to a better home; fix the expiration time
* add support https
* make double sum/min/max agg work on string columns
* style and compilation fixes
* fix tests
* address review comments
* add comment on SimpleDoubleAggregatorFactory
* make checkstyle happy
After enabling parallel builds for "mvn install", the sigar dependency
would sometimes resolve to the incorrect artifact repo for some of the
maven modules. This issue seems to be fixed by moving the definition of
the sigar dependency's artifact repo to the root POM.
Also, depending on network speeds, "mvn -q install" may take longer than
the default 10 minute timeout to print any output. Use travis_wait to
extend the timeout to 15 minutes.
* Use partitionsSpec for all task types
* fix doc
* fix typos and revert to use isPushRequired
* address comments
* move partitionsSpec to core
* remove hadoopPartitionsSpec
* firehose doc adjustments
* fix typo
* additional information on parser types in ingestion docs
* clarify ingest segment firehose docs, add sql firehose examples to sql extension pages
* fixit
* make sql firehose more forgiving my always constructing a MapInputRowParser from the parseSpec of whatever actual InputRowParser impl is provided, remove doc references to map based parsers
* transforms
* fix tests
* Fix dependency analyze warnings
Update the maven dependency plugin to the latest version and fix all
warnings for unused declared and used undeclared dependencies in the
compile scope. Added new travis job to add the check to CI. Also fixed
some source code files to use the correct packages for their imports.
* Fix licenses and dependencies
* Fix licenses and dependencies again
* Fix integration test dependency
* Address review comments
* Fix unit test dependencies
* Fix integration test dependency
* Fix integration test dependency again
* Fix integration test dependency third time
* Fix integration test dependency fourth time
* Fix compile error
* Fix assert package
* add CachingClusteredClient benchmark, refactor some stuff
* revert WeightedServerSelectorStrategy to ConnectionCountServerSelectorStrategy and remove getWeight since felt artificial, default mergeResults in toolchest implementation for topn, search, select
* adjust javadoc
* adjustments
* oops
* use it
* use BinaryOperator, remove CombiningFunction, use Comparator instead of Ordering, other review adjustments
* rename createComparator to createResultComparator, fix typo, firstNonNull nullable parameters
* doc updates and changes to use the CollectionUtils.mapValues utility method
* Add Structural Search patterns to intelliJ
* refactoring from PR comments
* put -> putIfAbsent
* do single key lookup
* Benchmarks: New SqlBenchmark, add caching & vectorization to some others.
- Introduce a new SqlBenchmark geared towards benchmarking a wide
variety of SQL queries. Rename the old SqlBenchmark to
SqlVsNativeBenchmark.
- Add (optional) caching to SegmentGenerator to enable easier
benchmarking of larger segments.
- Add vectorization to FilteredAggregatorBenchmark and GroupByBenchmark.
* Query vectorization.
This patch includes vectorized timeseries and groupBy engines, as well
as some analogs of your favorite Druid classes:
- VectorCursor is like Cursor. (It comes from StorageAdapter.makeVectorCursor.)
- VectorColumnSelectorFactory is like ColumnSelectorFactory, and it has
methods to create analogs of the column selectors you know and love.
- VectorOffset and ReadableVectorOffset are like Offset and ReadableOffset.
- VectorAggregator is like BufferAggregator.
- VectorValueMatcher is like ValueMatcher.
There are some noticeable differences between vectorized and regular
execution:
- Unlike regular cursors, vector cursors do not understand time
granularity. They expect query engines to handle this on their own,
which a new VectorCursorGranularizer class helps with. This is to
avoid too much batch-splitting and to respect the fact that vector
selectors are somewhat more heavyweight than regular selectors.
- Unlike FilteredOffset, FilteredVectorOffset does not leverage indexes
for filters that might partially support them (like an OR of one
filter that supports indexing and another that doesn't). I'm not sure
that this behavior is desirable anyway (it is potentially too eager)
but, at any rate, it'd be better to harmonize it between the two
classes. Potentially they should both do some different thing that
is smarter than what either of them is doing right now.
- When vector cursors are created by QueryableIndexCursorSequenceBuilder,
they use a morphing binary-then-linear search to find their start and
end rows, rather than linear search.
Limitations in this patch are:
- Only timeseries and groupBy have vectorized engines.
- GroupBy doesn't handle multi-value dimensions yet.
- Vector cursors cannot handle virtual columns or descending order.
- Only some filters have vectorized matchers: "selector", "bound", "in",
"like", "regex", "search", "and", "or", and "not".
- Only some aggregators have vectorized implementations: "count",
"doubleSum", "floatSum", "longSum", "hyperUnique", and "filtered".
- Dimension specs other than "default" don't work yet (no extraction
functions or filtered dimension specs).
Currently, the testing strategy includes adding vectorization-enabled
tests to TimeseriesQueryRunnerTest, GroupByQueryRunnerTest,
GroupByTimeseriesQueryRunnerTest, CalciteQueryTest, and all of the
filtering tests that extend BaseFilterTest. In all of those classes,
there are some test cases that don't support vectorization. They are
marked by special function calls like "cannotVectorize" or "skipVectorize"
that tell the test harness to either expect an exception or to skip the
test case.
Testing should be expanded in the future -- a project in and of itself.
Related to #3011.
* WIP
* Adjustments for unused things.
* Adjust javadocs.
* DimensionDictionarySelector adjustments.
* Add "clone" to BatchIteratorAdapter.
* ValueMatcher javadocs.
* Fix benchmark.
* Fixups post-merge.
* Expect exception on testGroupByWithStringVirtualColumn for IncrementalIndex.
* BloomDimFilterSqlTest: Tag two non-vectorizable tests.
* Minor adjustments.
* Update surefire, bump up Xmx in Travis.
* Some more adjustments.
* Javadoc adjustments
* AggregatorAdapters adjustments.
* Additional comments.
* Remove switching search.
* Only missiles.
Make static imports forbidden in tests and remove all occurrences to be
consistent with the non-test code.
Also, various changes to files affected by above:
- Reformat to adhere to druid style guide
- Fix various IntelliJ warnings
- Fix various SonarLint warnings (e.g., the expected/actual args to
Assert.assertEquals() were flipped)
* more sql support for expression array functions
* prepend/slice
* doc fixes
* fix imports
* fix tests
* add null numeric expr for proper conversions between ExprEval and Expr and back to ExprEval
* re-arrange
* imports :(
* add append/prepend test
* array support for expression language for multi-value string columns
* fix tests?
* fixes
* more tests
* fixes
* cleanup
* more better, more test
* ignore inspection
* license
* license fix
* inspection
* remove dumb import
* more better
* some comments
* add expr rewrite for arrayfn args for more magic, tests
* test stuff
* more tests
* fix test
* fix test
* castfunc can deal with arrays
* needs more empty array
* more tests, make cast to long array more forgiving
* refactor
* simplify ExprMacro Expr implementations with base classes in core
* oops
* more test
* use Shuttle for Parser.flatten, javadoc, cleanup
* fixes and more tests
* unused import
* fixes
* javadocs, cleanup, refactors
* fix imports
* more javadoc
* more javadoc
* more
* more javadocs, nonnullbydefault, minor refactor
* markdown fix
* adjustments
* more doc
* move initial filter out
* docs
* map empty arg lambda, apply function argument validation
* check function args at parse time instead of eval time
* more immutable
* more more immutable
* clarify grammar
* fix docs
* empty array is string test, we need a way to make arrays better maybe in the future, or define empty arrays as other types..
* Add state and error tracking for seekable stream supervisors
* Fixed nits in docs
* Made inner class static and updated spec test with jackson inject
* Review changes
* Remove redundant config param in supervisor
* Style
* Applied some of Jon's recommendations
* Add transience field
* write test
* implement code review changes except for reconsidering logic of markRunFinishedAndEvaluateHealth()
* remove transience reporting and fix SeekableStreamSupervisorStateManager impl
* move call to stateManager.markRunFinished() from RunNotice to runInternal() for tests
* remove stateHistory because it wasn't adding much value, some fixes, and add more tests
* fix tests
* code review changes and add HTTP health check status
* fix test failure
* refactor to split into a generic SupervisorStateManager and a specific SeekableStreamSupervisorStateManager
* fixup after merge
* code review changes - add additional docs
* cleanup KafkaIndexTaskTest
* add additional documentation for Kinesis indexing
* remove unused throws class
Recently we've been talking about using SegmentIds as map keys rather than
DataSegments, because its sense of equality is more well-defined. This is
a refactor that does this in the druid-sql module, which mostly involves
DruidSchema and some related classes. It should have no user-visible effects.
* Add checkstyle for "Local variable names shouldn't start with capital"
* Adjust some local variables to constants
* Replace StringUtils.LINE_SEPARATOR with System.lineSeparator()