Commit Graph

2144 Commits

Author SHA1 Message Date
Sashidhar Thallam ea4bad7836 Druid SQL EXTRACT time function - adding support for additional Time Units (#8068)
* 1. Added TimestampExtractExprMacro.Unit for MILLISECOND 2. expr eval for MILLISECOND 3. Added a test case to test extracting millisecond from expression. #7935

* 1. Adding DATASOURCE4 in tests. 2. Adding test TimeExtractWithMilliseconds

* Fixing testInformationSchemaTables test

* Fixing failing tests in DruidAvaticaHandlerTest

* Adding cannotVectorize() call before the test

* Extract time function - Adding support for MICROSECOND, ISODOW, ISOYEAR and CENTURY time units, documentation changes.

* Adding MILLISECOND in test case

* Adding support DECADE and MILLENNIUM, updating test case and documentation

* Fixing expression eval for DECADE and MILLENIUM
2019-07-19 20:38:32 -07:00
Clint Wylie 03e55d30eb
add CachingClusteredClient benchmark, refactor some stuff (#8089)
* add CachingClusteredClient benchmark, refactor some stuff

* revert WeightedServerSelectorStrategy to ConnectionCountServerSelectorStrategy and remove getWeight since felt artificial, default mergeResults in toolchest implementation for topn, search, select

* adjust javadoc

* adjustments

* oops

* use it

* use BinaryOperator, remove CombiningFunction, use Comparator instead of Ordering, other review adjustments

* rename createComparator to createResultComparator, fix typo, firstNonNull nullable parameters
2019-07-18 13:16:28 -07:00
Surekha da16144495 Refactoring to use `CollectionUtils.mapValues` (#8059)
* doc updates and changes to use the CollectionUtils.mapValues utility method

* Add Structural Search patterns to intelliJ

* refactoring from PR comments

* put -> putIfAbsent

* do single key lookup
2019-07-17 23:02:22 -07:00
Clint Wylie 15fbf5983d add Class.getCanonicalName to forbidden-apis (#8086)
* add checkstyle to forbid unecessary use of Class.getCanonicalName

* use forbiddin-api instead of checkstyle

* add space
2019-07-16 15:21:50 -07:00
Gian Merlino ffa25b7832
Query vectorization. (#6794)
* Benchmarks: New SqlBenchmark, add caching & vectorization to some others.

- Introduce a new SqlBenchmark geared towards benchmarking a wide
  variety of SQL queries. Rename the old SqlBenchmark to
  SqlVsNativeBenchmark.
- Add (optional) caching to SegmentGenerator to enable easier
  benchmarking of larger segments.
- Add vectorization to FilteredAggregatorBenchmark and GroupByBenchmark.

* Query vectorization.

This patch includes vectorized timeseries and groupBy engines, as well
as some analogs of your favorite Druid classes:

- VectorCursor is like Cursor. (It comes from StorageAdapter.makeVectorCursor.)
- VectorColumnSelectorFactory is like ColumnSelectorFactory, and it has
  methods to create analogs of the column selectors you know and love.
- VectorOffset and ReadableVectorOffset are like Offset and ReadableOffset.
- VectorAggregator is like BufferAggregator.
- VectorValueMatcher is like ValueMatcher.

There are some noticeable differences between vectorized and regular
execution:

- Unlike regular cursors, vector cursors do not understand time
  granularity. They expect query engines to handle this on their own,
  which a new VectorCursorGranularizer class helps with. This is to
  avoid too much batch-splitting and to respect the fact that vector
  selectors are somewhat more heavyweight than regular selectors.
- Unlike FilteredOffset, FilteredVectorOffset does not leverage indexes
  for filters that might partially support them (like an OR of one
  filter that supports indexing and another that doesn't). I'm not sure
  that this behavior is desirable anyway (it is potentially too eager)
  but, at any rate, it'd be better to harmonize it between the two
  classes. Potentially they should both do some different thing that
  is smarter than what either of them is doing right now.
- When vector cursors are created by QueryableIndexCursorSequenceBuilder,
  they use a morphing binary-then-linear search to find their start and
  end rows, rather than linear search.

Limitations in this patch are:

- Only timeseries and groupBy have vectorized engines.
- GroupBy doesn't handle multi-value dimensions yet.
- Vector cursors cannot handle virtual columns or descending order.
- Only some filters have vectorized matchers: "selector", "bound", "in",
  "like", "regex", "search", "and", "or", and "not".
- Only some aggregators have vectorized implementations: "count",
  "doubleSum", "floatSum", "longSum", "hyperUnique", and "filtered".
- Dimension specs other than "default" don't work yet (no extraction
  functions or filtered dimension specs).

Currently, the testing strategy includes adding vectorization-enabled
tests to TimeseriesQueryRunnerTest, GroupByQueryRunnerTest,
GroupByTimeseriesQueryRunnerTest, CalciteQueryTest, and all of the
filtering tests that extend BaseFilterTest. In all of those classes,
there are some test cases that don't support vectorization. They are
marked by special function calls like "cannotVectorize" or "skipVectorize"
that tell the test harness to either expect an exception or to skip the
test case.

Testing should be expanded in the future -- a project in and of itself.

Related to #3011.

* WIP

* Adjustments for unused things.

* Adjust javadocs.

* DimensionDictionarySelector adjustments.

* Add "clone" to BatchIteratorAdapter.

* ValueMatcher javadocs.

* Fix benchmark.

* Fixups post-merge.

* Expect exception on testGroupByWithStringVirtualColumn for IncrementalIndex.

* BloomDimFilterSqlTest: Tag two non-vectorizable tests.

* Minor adjustments.

* Update surefire, bump up Xmx in Travis.

* Some more adjustments.

* Javadoc adjustments

* AggregatorAdapters adjustments.

* Additional comments.

* Remove switching search.

* Only missiles.
2019-07-12 12:54:07 -07:00
Clint Wylie abf9843e2a fail complex type 'serde' registration when registered type does not match expected type (#7985)
* make ComplexMetrics.registerSerde type check on register, resolves #7959

* add test

* simplify

* unused imports :/

* simplify

* burned by imports yet again

* remove unused constructor

* switch to getName

* heh oops
2019-07-11 23:03:15 -07:00
Fokko Driesprong 0aabeb4b1a Enable Spotbugs: MS_OOI_PKGPROTECT (#8022) 2019-07-08 13:17:56 +05:30
Chi Cao Minh 1166bbcb75 Remove static imports from tests (#8036)
Make static imports forbidden in tests and remove all occurrences to be
consistent with the non-test code.

Also, various changes to files affected by above:
- Reformat to adhere to druid style guide
- Fix various IntelliJ warnings
- Fix various SonarLint warnings (e.g., the expected/actual args to
  Assert.assertEquals() were flipped)
2019-07-06 09:33:12 -07:00
Gian Merlino 9b499df14e Fix ExpressionVirtualColumn capabilities; fix groupBy's improper uses of StorageAdapter#getColumnCapabilities. (#8013)
* GroupBy: Fix improper uses of StorageAdapter#getColumnCapabilities.

1) A usage in "isArrayAggregateApplicable" that would potentially incorrectly use
   array-based aggregation on a virtual column that shadows a real column.
2) A usage in "process" that would potentially use the more expensive multi-value
   aggregation path on a singly-valued virtual column. (No correctness issue, but
   a performance issue.)

* Add addl javadoc.

* ExpressionVirtualColumn: Set multi-value flag.
2019-07-05 13:17:05 -07:00
Clint Wylie 0344a020bb optimize single string dimension expression selector (#8014)
* optimize single string dimension expression selector

* more javadoc

* oops

* fix

* fix it

* import
2019-07-04 16:26:10 -07:00
Clint Wylie e6ba258197 multi-value string expression transformation fix (#8019)
* multi-value string expression transformation fix

* fixes

* more docs and test

* revert unintended doc change

* formatting

* change tostring to print binding identifier

* review fixup

* oops
2019-07-03 23:03:47 -07:00
Clint Wylie c556d44a19
more sql support for expression array functions (#7974)
* more sql support for expression array functions

* prepend/slice

* doc fixes

* fix imports

* fix tests

* add null numeric expr for proper conversions between ExprEval and Expr and back to ExprEval

* re-arrange

* imports :(

* add append/prepend test
2019-07-02 21:39:26 -07:00
Alexander Saydakov f38a62e949 theta sketch to string post agg (#7937) 2019-06-27 15:09:57 -07:00
Clint Wylie 151edeec3c
expression virtual column selector fix for expressions which produce array types (#7958)
* fix bug in multi-value string expression column selector

* more test

* imports!!

* fixes
2019-06-26 16:57:13 -07:00
Xue Yu b9c6a26c0e Use ComplexMetrics.registerSerde() across the codebase (#7925)
* refactor complexmetric registerserde

* fix error

* feedback address
2019-06-25 11:39:04 +03:00
Fokko Driesprong 82b248cc17 Spotbugs: Enable MS_SHOULD_BE_FINAL (#7946) 2019-06-23 15:42:18 -07:00
Clint Wylie 494b8ebe56 multi-value string column support for expressions (#7588)
* array support for expression language for multi-value string columns

* fix tests?

* fixes

* more tests

* fixes

* cleanup

* more better, more test

* ignore inspection

* license

* license fix

* inspection

* remove dumb import

* more better

* some comments

* add expr rewrite for arrayfn args for more magic, tests

* test stuff

* more tests

* fix test

* fix test

* castfunc can deal with arrays

* needs more empty array

* more tests, make cast to long array more forgiving

* refactor

* simplify ExprMacro Expr implementations with base classes in core

* oops

* more test

* use Shuttle for Parser.flatten, javadoc, cleanup

* fixes and more tests

* unused import

* fixes

* javadocs, cleanup, refactors

* fix imports

* more javadoc

* more javadoc

* more

* more javadocs, nonnullbydefault, minor refactor

* markdown fix

* adjustments

* more doc

* move initial filter out

* docs

* map empty arg lambda, apply function argument validation

* check function args at parse time instead of eval time

* more immutable

* more more immutable

* clarify grammar

* fix docs

* empty array is string test, we need a way to make arrays better maybe in the future, or define empty arrays as other types..
2019-06-19 13:57:37 -07:00
SandishKumarHN 01881e3a98 Use only com.google.errorprone.annotations.concurrent.GuardedBy, not javax.annotations.concurrent.GuardedBy (#7889) 2019-06-17 15:58:51 +02:00
Clint Wylie 12a1ecfc2b allow sql lookup function to take advantage of injective lookups (#7655) 2019-06-06 14:36:10 -07:00
Himanshu 0493780799
discard filter when processing subtotalsSpec (#7827) 2019-06-04 10:59:22 -07:00
Xue Yu d482da6e9b fix timestamp ceil lower bound bug (#7823) 2019-06-04 01:16:31 -07:00
litao91 55af692b56 Fix repeated expr parsing in ExpressionPostAggregation (#7791)
* Fix repeatedly expr parsing in ExpressionPostAggregation

Change-Id: Ib739fb1cbc460afeb59a255f635305441dc6997b

* Style fix and avoid code copying

Change-Id: I2d6ba3d1ae37f1fb84b6f7eaab5dab817e1980ec

* Lazilly parse expressions in ExpressionVirtualColumn and ExpressionDimFilter

Change-Id: I5ae2bb3ef9a18fbbfb5e0780c86f6bc0039edc83
2019-05-31 20:56:31 -07:00
Jihoon Son 7abfbb066a Bump up snapshot version to 0.16.0 (#7802) 2019-05-30 17:17:33 -07:00
Clint Wylie aaefdb3386 fix group-by v2 BufferArrayGrouper for empty multi-value dimension row (#7794)
* fix groupby v2 BufferArrayGrouper

* better name test

* fix sql compatible null handling array grouper bug

* another test
2019-05-30 12:59:59 -07:00
Roman Leventov 782863ed0f Fix some problems reported by PVS-Studio (#7738)
* Fix some problems reported by PVS-Studio

* Address comments
2019-05-29 11:20:45 -07:00
BIGrey 42cf078843 Fix memory problem (OOM/FGC) when expression is used in metricsSpec (#7716)
* AggregatorUtil should cache parsed expression to avoid memory problem (OOM/FGC) when Expression is used in metricsSpec

* remove debug log check in Parser.parse

* remove cache and use suppliers.memorize
2019-05-27 09:46:17 -07:00
Merlin Lee 26fad7e06a Add checkstyle for "Local variable names shouldn't start with capital" (#7681)
* Add checkstyle for "Local variable names shouldn't start with capital"

* Adjust some local variables to constants

* Replace StringUtils.LINE_SEPARATOR with System.lineSeparator()
2019-05-23 18:40:28 +02:00
Clint Wylie ffc2397bcd fix AggregatorFactory.finalizeComputation implementations to be ok with null inputs (#7731)
* AggregatorFactory finalizeComputation is nullable with nullable input, make implementations honor this

* fixes
2019-05-22 21:13:09 -07:00
Himanshu fb0c846941
Virtual column updates for exploiting base column internal structure (#7618)
* VirtualColumn updates for exploiting base column internal structure

* unit tests for virtual column interface updates

* groupBy needs to use VirtualizedColumnSelectorFactory if outer query in
nested groupBy has virtual columns.

* fix strict compile checks

* fix teamcity build errors

* add comment explaining useVirtualizedColumnSelectorFactory flag in RowBasedGrouperHelper.createGrouperAccumulatorPair(..)
2019-05-20 17:04:35 -07:00
Clint Wylie c4a4223c9c fix issue where result level cache was recomputing post aggs that were already cached, causing issues with finalizing aggregators (#7708) 2019-05-20 16:51:50 -07:00
Himanshu 8687f424f9
make ComplexColumn an interface and ExtensionPoint (#7633)
* make ComplexColumn an interface and ExtensionPoint

* incorporate review comments

* make ColumnValueSelector @ExtensionPoint

* more java docs

* add close() method to ComplexColumn interface
2019-05-15 20:59:55 -07:00
Fokko Driesprong 2aa9613bed Bump Checkstyle to 8.20 (#7651)
* Bump Checkstyle to 8.20

Moderate severity vulnerability that affects:
com.puppycrawl.tools:checkstyle

Checkstyle prior to 8.18 loads external DTDs by default,
which can potentially lead to denial of service attacks
or the leaking of confidential information.

Affected versions: < 8.18

* Oops, missed one

* Oops, missed a few
2019-05-14 11:53:37 -07:00
Alexander Saydakov ca1a6649f6 Datasketches quantiles more post-aggs (#7550)
* rank and CDF post-aggs

* added post-aggs to the module

* added new post-aggs

* moved post-agg IDs

* moved post-agg IDs
2019-05-10 11:46:54 -07:00
Xavier Léauté 1d49364d08 Set direct memory if unable to detect JVM config (#7606)
* Set direct memory if unable to detect JVM config

Java 9 and above prevents us from detecting the maximum available direct
memory.

This change adds a fallback method to use at most 25% of maximum heap
size, which should be a reasonable default.

Unless -XX:MaxDirectMemorySize is set, recent JVMs will default maximum
direct memory to match the maximum heap size, so this should work out of
the box in most cases. For completeness we print instructions in the log
to explain how to adjust settings if necessary.

* skip test rather than succeeding

* reword log message

Co-Authored-By: Himanshu <g.himanshu@gmail.com>
2019-05-09 22:30:42 -07:00
Jihoon Son 18e0d6acb4 Fix resultLevelCache for timeseries with grandTotal (#7624)
* Fix resultLevelCache for timeseries with grandTotal

* Address comment

* fix test
2019-05-09 18:11:04 -07:00
Samarth Jain b542bb9f34 TDigest backed sketch aggregators (#7331)
* First set of changes for tDigest histogram

* Add license

* Address code review comments

* Add a doc page for new T-Digest sketch aggregators. Minor code cleanup and comments.

* Remove synchronization from BufferAggregators. Address code review comments

* Fix typo
2019-05-09 17:22:55 -07:00
Jonathan Wei 1b577c9b1d
Fix exception when using complex aggs with result level caching (#7614)
* Fix exception when using complex aggs with result level caching

* Add test comments

* checkstyle

* Add helper function for getting aggs from cache

* Move method to CacheStrategy

* Revert QueryToolChest changes

* Update test comments
2019-05-09 13:49:11 -07:00
Xavier Léauté f7bfe8f269
Update mocking libraries for Java 11 support (#7596)
* update easymock / powermock for to 4.0.2 / 2.0.2 for JDK11 support
* update tests to use new easymock interfaces
* fix tests failing due to easymock fixes
* remove dependency on jmockit
* fix race condition in ResourcePoolTest
2019-05-06 12:28:56 -07:00
Gian Merlino f776b94089 AggregatorFactory: Clarify methods that return other AggregatorFactories. (#7293) 2019-04-29 19:27:30 +02:00
Xavier Léauté 30fed78daf Java 9 compatible specialized class compilation (#7477)
* Java 9 compatible specialized class compilation

We currently use Unsafe.defineClass to compile specialized classes,
which has been removed in Java 9 and above. This change switches to
MethodHandles.Lookup.defineClass at runtime, which provides similar
functionality in newer JDK versions.

* add comments

* fix incorrect comment

* add unsafe utility class

* make comments java-doc style

* fix checkstyle errors

* rename unsafe -> unsafeutil

* move defineClass method to utility class

* rename unsafeutil -> unsafeutils to match other utility class names
* remove extra lookup method

* add utiliy class docs

* more comments

* minor comments and formatting
2019-04-29 18:44:28 +02:00
Justin Borromeo 07dd742e35 Fix time-ordered scan queries on realtime segments (#7546)
* Initial commit

* Added test for int to long conversion

* Add appenderator test for realtime scan query

* get rid of todo

* Fix forbidden apis

* Jon's recommendations

* Formatting
2019-04-26 16:12:10 -07:00
Roman Leventov 6fd6e5de89 Make JavaScript and XML errors non-TeamCity errors; Update JavaScript language level to ES6 in IntelliJ settings (#7541)
* Make JavaScript and XML errors non-TeamCity errors; Update JavaScript language level to ES6 in IntelliJ settings

* Add license comment to assembly-2.0.0.xsd

* Add .idea/README.md with comments
2019-04-25 11:21:58 -07:00
Qi Chen b59b9ef8c7 Fix too many dentry cache slab objs#7508. (#7509) 2019-04-19 20:39:50 -07:00
Surekha c2a42e05bb Fix result-level cache for queries (#7325)
* Add SegmentDescriptor interval in the hash while calculating Etag

* Add computeResultLevelCacheKey to CacheStrategy

Make HavingSpec cacheable and implement getCacheKey for subclasses
Add unit tests for computeResultLevelCacheKey

* Add more tests

* Use CacheKeyBuilder for HavingSpec's getCacheKey

* Initialize aggregators map to avoid NPE

* adjust cachekey builder for HavingSpec to ignore aggregators

* unused import

* PR comments
2019-04-18 13:31:29 -07:00
Justin Borromeo 85f10ed0d0 Support querying realtime segments using time-ordered scan queries and fix broken scan queries without time column (#7454)
* Update scan query runner factory to accept SpecificSegmentSpec

*  nit

* Sorry travis

* Improve logging and fix doc

* Bug fix

* Friendlier error msgs and tests to cover bug

* Address Gian's comments

* Fix doc

* Added tests for empty and null column list

* Style

* Fix checking wrong order (looking at query param when it should be
looking at the null-handled order)

* Add test case for null order

* Fix ScanQueryRunnerTest

* Forbidden APIs fixed
2019-04-12 19:08:34 -07:00
Jonathan Wei 7d9cb6944b Adjust BufferAggregator.get() impls to return copies (#7464)
* Adjust BufferAggregator.get() impls to return copies

* Update BufferAggregator docs, more agg fixes

* Update BufferAggregator get() doc
2019-04-12 19:04:07 -07:00
Justin Borromeo 799c66d9ac Allow max rows and max segments for time-ordered scans to be overridden using the scan query JSON spec (#7413)
* Initial changes

* Fixed NPEs

* Fixed failing spec test

* Fixed failing Calcite test

* Move configs to context

* Validated and added docs

* fixed weird indentation

* Update default context vals in doc

* Fixed allowable values
2019-04-07 20:12:52 -07:00
Clint Wylie 76b4a5c62e refactor lookups to be more chill to router (#7222)
* refactor lookups to be more chill to router

* remove accidental change

* fix and combine LookupIntrospectionResourceTest

* fix inspection

* rename RouterLookupModule to LookupSerdeModule and RouterLookupExtractorFactoryContainerProvider to NoopLookupExtractorFactoryContainerProvider

* make comment generic

* use ConfigResourceFilter instead of StateResourceFilter

* fix indentation

* unused import

* another unused import

* refactor some stuff into processing module, split up LookupModule.java classes into their own files
2019-04-05 14:49:41 -07:00
Richard Startin d29a32062f upgrade to RoaringBitmap 0.8.0 and serialise directly to ByteBuffer (#7408) 2019-04-04 13:22:50 -04:00
Clint Wylie a99f0ff450 prefix no-op aggs with "Noop" (#6960) 2019-04-02 15:05:07 -07:00
Justin Borromeo ad7862c58a Time Ordering On Scans (#7133)
* Moved Scan Builder to Druids class and started on Scan Benchmark setup

* Need to form queries

* It runs.

* Stuff for time-ordered scan query

* Move ScanResultValue timestamp comparator to a separate class for testing

* Licensing stuff

* Change benchmark

* Remove todos

* Added TimestampComparator tests

* Change number of benchmark iterations

* Added time ordering to the scan benchmark

* Changed benchmark params

* More param changes

* Benchmark param change

* Made Jon's changes and removed TODOs

* Broke some long lines into two lines

* nit

* Decrease segment size for less memory usage

* Wrote tests for heapsort scan result values and fixed bug where iterator
wasn't returning elements in correct order

* Wrote more tests for scan result value sort

* Committing a param change to kick teamcity

* Fixed codestyle and forbidden API errors

* .

* Improved conciseness

* nit

* Created an error message for when someone tries to time order a result
set > threshold limit

* Set to spaces over tabs

* Fixing tests WIP

* Fixed failing calcite tests

* Kicking travis with change to benchmark param

* added all query types to scan benchmark

* Fixed benchmark queries

* Renamed sort function

* Added javadoc on ScanResultValueTimestampComparator

* Unused import

* Added more javadoc

* improved doc

* Removed unused import to satisfy PMD check

* Small changes

* Changes based on Gian's comments

* Fixed failing test due to null resultFormat

* Added config and get # of segments

* Set up time ordering strategy decision tree

* Refactor and pQueue works

* Cleanup

* Ordering is correct on n-way merge -> still need to batch events into
ScanResultValues

* WIP

* Sequence stuff is so dirty :(

* Fixed bug introduced by replacing deque with list

* Wrote docs

* Multi-historical setup works

* WIP

* Change so batching only occurs on broker for time-ordered scans

Restricted batching to broker for time-ordered queries and adjusted
tests

Formatting

Cleanup

* Fixed mistakes in merge

* Fixed failing tests

* Reset config

* Wrote tests and added Javadoc

* Nit-change on javadoc

* Checkstyle fix

* Improved test and appeased TeamCity

* Sorry, checkstyle

* Applied Jon's recommended changes

* Checkstyle fix

* Optimization

* Fixed tests

* Updated error message

* Added error message for UOE

* Renaming

* Finish rename

* Smarter limiting for pQueue method

* Optimized n-way merge strategy

* Rename segment limit -> segment partitions limit

* Added a bit of docs

* More comments

* Fix checkstyle and test

* Nit comment

* Fixed failing tests -> allow usage of all types of segment spec

* Fixed failing tests -> allow usage of all types of segment spec

* Revert "Fixed failing tests -> allow usage of all types of segment spec"

This reverts commit ec470288c7.

* Revert "Merge branch '6088-Time-Ordering-On-Scans-N-Way-Merge' of github.com:justinborromeo/incubator-druid into 6088-Time-Ordering-On-Scans-N-Way-Merge"

This reverts commit 57033f36df, reversing
changes made to 8f01d8dd16.

* Check type of segment spec before using for time ordering

* Fix bug in numRowsScanned

* Fix bug messing up count of rows

* Fix docs and flipped boolean in ScanQueryLimitRowIterator

* Refactor n-way merge

* Added test for n-way merge

* Refixed regression

* Checkstyle and doc update

* Modified sequence limit to accept longs and added test for long limits

* doc fix

* Implemented Clint's recommendations
2019-03-28 14:37:09 -07:00
Justin Borromeo c7fea6ac8f Added better QueryInterruptedException error message for UnsupportedOperationException (#7248)
* Added error message for UOE

* Updated docs

* Doc change

* Doc change
2019-03-26 15:20:24 -07:00
Roman Leventov bca40dcdaf
Fix some IntelliJ inspections (#7273)
Prepare TeamCity for IntelliJ 2018.3.1 upgrade. Mostly removed redundant exceptions declarations in `throws` clauses.
2019-03-25 21:11:01 -03:00
Jihoon Son 892d1d35d6
Deprecate NoneShardSpec and drop support for automatic segment merge (#6883)
* Deprecate noneShardSpec

* clean up noneShardSpec constructor

* revert unnecessary change

* Deprecate mergeTask

* add more doc

* remove convert from indexMerger

* Remove mergeTask

* remove HadoopDruidConverterConfig

* fix build

* fix build

* fix teamcity

* fix teamcity

* fix ServerModule

* fix compilation

* fix compilation
2019-03-15 23:29:25 -07:00
Furkan KAMACI 7ada1c49f9 Prohibit Throwables.propagate() (#7121)
* Throw caught exception.

* Throw caught exceptions.

* Related checkstyle rule is added to prevent further bugs.

* RuntimeException() is used instead of Throwables.propagate().

* Missing import is added.

* Throwables are propogated if possible.

* Throwables are propogated if possible.

* Throwables are propogated if possible.

* Throwables are propogated if possible.

* * Checkstyle definition is improved.
* Throwables.propagate() usages are removed.

* Checkstyle pattern is changed for only scanning "Throwables.propagate(" instead of checking lookbehind.

* Throwable is kept before firing a Runtime Exception.

* Fix unused assignments.
2019-03-14 18:28:33 -03:00
Furkan KAMACI 48bc523bdf Locale problem is fixed which fails tests. (#7120)
* Locale problem is fixed which fails tests.

* Forbidden apis definition is improved to prevent using com.ibm.icu.text.SimpleDateFormat and com.ibm.icu.text.DateFormatSymbols without using any Locale defined.

* Error message is improved.
2019-03-13 18:47:14 -03:00
Gian Merlino 98a1b5537f Fix time-extraction topN with non-STRING outputType. (#7257)
Similar to other bugs fixed in #6220, but this one was missed. This bug would
cause "extraction" dimensionSpecs on the "__time" column with non-STRING
outputTypes to potentially be output as STRING sometimes instead of LONG,
causing incompletely merged results.
2019-03-13 13:53:07 -07:00
Gian Merlino 4290e5ae7a Cache selectors in QueryableIndexColumnSelectorFactory. (#7216)
For selectors with internal caches (like SingleScanTimeDimensionSelector,
SingleLongInputCachingExpressionColumnValueSelector, etc) we can get a perf
boost and memory usage decrease by sharing selectors.
2019-03-11 11:33:01 -07:00
Jihoon Son 9bebf113ba
Fix race in historical when loading segments in parallel (#7203)
* Fix race in historical when loading segments in parallel

* revert unnecessary change

* remove synchronized

* add reference counting locking

* fix build

* fix comment
2019-03-08 17:54:05 -08:00
Jonathan Wei 5486c2abf8
Update LICENSE and NOTICE files (#7026)
* Update LICENSE and NOTICE files

* Update react-table version
2019-03-04 18:45:22 -08:00
Clint Wylie 9fa649b3bd segment metadata fallback analysis if no bitmaps (#7116)
* segment metadata fallback analysis if no bitmaps

* remove accidental line

* remove nonsense size estimation

* less ternary

* fix it

* do the thing
2019-02-26 11:27:41 -08:00
Himanshu Pandey 8b803cbc22 Added checkstyle for "Methods starting with Capital Letters" (#7118)
* Added checkstyle for "Methods starting with Capital Letters" and changed the method names violating this.

* Un-abbreviate the method names in the calcite tests

* Fixed checkstyle errors

* Changed asserts position in the code
2019-02-23 20:10:31 -08:00
Justin Borromeo c7eeeabf45 2528 Replace Incremental Index Global Flags with Getters (#7043)
* Eliminated reportParseExceptions and deserializeComplexMetrics

* Removed more global flags

* Cleanup

* Addressed Surekha's recommendations
2019-02-15 13:36:46 -08:00
Jonathan Wei 1f29940811
Fix momentsketch build issues (#7074)
* Fix momentsketch build issues

* Remove unused section in pom

* Fix test

* Remove unused method

* Checkstyle
2019-02-13 21:32:43 -08:00
Edward Gan 90c1a54b86 Moments Sketch custom aggregator (#6581)
* Moments Sketch Integration with Druid

* updates, add documentation, fix warnings

* nits

* disallowed base64

* update to druid 0.14
2019-02-13 14:03:47 -08:00
Jihoon Son c9f21bc782 Fix filterSegments for TimeBoundary and DataSourceMetadata queries (#7023)
* Fix filterSegments for TimeBoundary and DataSourceMetadata queries

* add javadoc

* fix build
2019-02-08 10:03:02 -08:00
Jonathan Wei fafbc4a80e
Set version to 0.15.0-incubating-SNAPSHOT (#7014) 2019-02-07 14:02:52 -08:00
Justin Borromeo 6723243ed2 Create Scan Benchmark (#6986)
* Moved Scan Builder to Druids class and started on Scan Benchmark setup

* Need to form queries

* It runs.

* Remove todos

* Change number of benchmark iterations

* Changed benchmark params

* More param changes

* Made Jon's changes and removed TODOs

* Broke some long lines into two lines

* Decrease segment size for less memory usage

* Committing a param change to kick teamcity
2019-02-06 14:45:01 -08:00
Jonathan Wei 8bc5eaa908
Set version to 0.14.0-incubating-SNAPSHOT (#7003) 2019-02-04 19:36:20 -08:00
Roman Leventov 0e926e8652 Prohibit assigning concurrent maps into Map-typed variables and fields and fix a race condition in CoordinatorRuleManager (#6898)
* Prohibit assigning concurrent maps into Map-types variables and fields; Fix a race condition in CoordinatorRuleManager; improve logic in DirectDruidClient and ResourcePool

* Enforce that if compute(), computeIfAbsent(), computeIfPresent() or merge() is called on a ConcurrentHashMap, it's stored in a ConcurrentHashMap-typed variable, not ConcurrentMap; add comments explaining get()-before-computeIfAbsent() optimization; refactor Counters; fix a race condition in Intialization.java

* Remove unnecessary comment

* Checkstyle

* Fix getFromExtensions()

* Add a reference to the comment about guarded computeIfAbsent() optimization; IdentityHashMap optimization

* Fix UriCacheGeneratorTest

* Workaround issue with MaterializedViewQueryQueryToolChest

* Strengthen Appenderator's contract regarding concurrency
2019-02-04 09:18:12 -08:00
Roman Leventov f7df5fedcc Add several missing inspectRuntimeShape() calls (#6893)
* Add several missing inspectRuntimeShape() calls

* Add lgK to runtime shapes
2019-01-31 20:04:26 -08:00
Furkan KAMACI 30ec608038 Fix mixed up segment ids at SelectBinaryFnTest.java (#6946) 2019-01-30 20:04:16 -08:00
Clint Wylie de810286cd fix bug with expression virtual column selectors backed by a single long column (#6957)
* fix issue with SingleLongInputCachingExpressionColumnValueSelector when sql compatible null handling enabled

* add test with doubles to show same behavior for floats/doubles that lack the optimization of longs

* simplify

* fix import
2019-01-30 10:13:07 -05:00
Clint Wylie a6d81c0d16 Adds bloom filter aggregator to 'druid-bloom-filters' extension (#6397)
* blooming aggs

* partially address review

* fix docs

* minor test refactor after rebase

* use copied bloomkfilter

* add ByteBuffer methods to BloomKFilter to allow agg to use in place, simplify some things, more tests

* add methods to BloomKFilter to get number of set bits, use in comparator, fixes

* more docs

* fix

* fix style

* simplify bloomfilter bytebuffer merge, change methods to allow passing buffer offsets

* oof, more fixes

* more sane docs example

* fix it

* do the right thing in the right place

* formatting

* fix

* avoid conflict

* typo fixes, faster comparator, docs for comparator behavior

* unused imports

* use buffer comparator instead of deserializing

* striped readwrite lock for buffer agg, null handling comparator, other review changes

* style fixes

* style

* remove sync for now

* oops

* consistency

* inspect runtime shape of selector instead of selector plus, static comparator, add inner exception on serde exception

* CardinalityBufferAggregator inspect selectors instead of selectorPluses

* fix style

* refactor away from using ColumnSelectorPlus and ColumnSelectorStrategyFactory to instead use specialized aggregators for each supported column type, other review comments

* adjustment

* fix teamcity error?

* rename nil aggs to empty, change empty agg constructor signature, add comments

* use stringutils base64 stuff to be chill with master

* add aggregate combiner, comment
2019-01-29 20:05:17 +07:00
Benedict Jin 72a571fbf7 For performance reasons, use `java.util.Base64` instead of Base64 in Apache Commons Codec and Guava (#6913)
* * Add few methods about base64 into StringUtils

* Use `java.util.Base64` instead of others

* Add org.apache.commons.codec.binary.Base64 & com.google.common.io.BaseEncoding into druid-forbidden-apis

* Rename encodeBase64String & decodeBase64String

* Update druid-forbidden-apis
2019-01-25 17:32:29 -08:00
Himanshu Pandey e1033bb412 Issue#6892- Replaced Math.random() with ThreadLocalRandom.current().nextDouble() (#6914)
*  Replacing Math.random() with ThreadLocalRandom.current().nextDouble()

* Added java.lang.Math#random() in forbidden-apis.txt

* Minor change in the message - druid-forbidden-apis.txt
2019-01-25 19:49:20 +08:00
Clint Wylie 66f64cd8bd fix long/float/double dimension filtering for columns with nulls (#6906)
* fix long,float, double dimension filtering when sql compatible null handling is enabled and the column has null values

* revert unintended change

* fix tests
2019-01-23 22:36:52 -08:00
Roman Leventov 8eae26fd4e Introduce SegmentId class (#6370)
* Introduce SegmentId class

* tmp

* Fix SelectQueryRunnerTest

* Fix indentation

* Fixes

* Remove Comparators.inverse() tests

* Refinements

* Fix tests

* Fix more tests

* Remove duplicate DataSegmentTest, fixes #6064

* SegmentDescriptor doc

* Fix SQLMetadataStorageUpdaterJobHandler

* Fix DataSegment deserialization for ignoring id

* Add comments

* More comments

* Address more comments

* Fix compilation

* Restore segment2 in SystemSchemaTest according to a comment

* Fix style

* fix testServerSegmentsTable

* Fix compilation

* Add comments about why SegmentId and SegmentIdWithShardSpec are separate classes

* Fix SystemSchemaTest

* Fix style

* Compare SegmentDescriptor with SegmentId in Javadoc and comments rather than with DataSegment

* Remove a link, see https://youtrack.jetbrains.com/issue/IDEA-205164

* Fix compilation
2019-01-21 11:11:10 -08:00
Jihoon Son cc06e7e2df Fix fallback to cursor-based plan in UseIndexesStrategy (#6875)
* Fix fallback to cursor-based plan in UseIndexesStrategy

* fix build

* add a comment
2019-01-18 10:41:01 +08:00
Jonathan Wei 68f744ec0a
Fixed buckets histogram aggregator (#6638)
* Fixed buckets histogram aggregator

* PR comments

* More PR comments

* Checkstyle

* TeamCity

* More TeamCity

* PR comment

* PR comment

* Fix doc formatting
2019-01-17 14:51:16 -08:00
Dayue Gao 5b8a221713 Add SQL id, request logs, and metrics (#6302)
* use SqlLifecyle to manage sql execution, add sqlId

* add sql request logger

* fix UT

* rename sqlId to sqlQueryId, sql/time to sqlQuery/time, etc

* add docs and more sql request logger impls

* add UT for http and jdbc

* fix forbidden use of com.google.common.base.Charsets

* fix UT in QuantileSqlAggregatorTest, supressed unused warning of getSqlQueryId

* do not use default method in QueryMetrics interface

* capitalize 'sql' everywhere in the non-property parts of the docs

* use RequestLogger interface to log sql query

* minor bugfixes and add switching request logger

* add filePattern configs for FileRequestLogger

* address review comments, adjust sql request log format

* fix inspection error

* try SuppressWarnings("RedundantThrows") to fix inspection error on ComposingRequestLoggerProvider
2019-01-15 23:12:59 -08:00
Charles Allen 5d2947cd52 Use Guava Compatible immediate executor service (#6815)
* Use multi-guava version friendly direct executor implementation

* Don't use a singleton

* Fix strict compliation complaints

* Copy Guava's DirectExecutor

* Fix javadoc

* Imports are the devil
2019-01-11 10:42:19 -08:00
Richard Startin 99097617a1 use RoaringBitmapWriter for RoaringBitmap construction (#6764) 2019-01-08 17:18:41 -08:00
dongyifeng def823124c add version comparator for StringComparator (#6745)
* add version comparator for StringComparator

* add more test case and docs
2019-01-08 17:17:03 -08:00
Clint Wylie 9505074530 fix log typo (#6755)
* fix log typo, add DataSegmentUtils.getIdentifiersString util method

* fix indecisive oops
2018-12-18 15:10:25 -08:00
Clint Wylie 486c6f3cf9 emit logs that are only useful for debugging at debug level (#6741)
* make logs that are only useful for debugging be at debug level so log volume is much more chill

* info level messages for total merge buffer allocated/free

* more chill compaction logs
2018-12-17 14:20:28 +08:00
Atul Mohan 86e3ae5b48 Add fail message (#6720) 2018-12-11 08:05:50 -08:00
Gian Merlino b7709e1245 FileUtils: Sync directory entry too on writeAtomically. (#6677)
* FileUtils: Sync directory entry too on writeAtomically.

See the fsync(2) man page for why this is important:
https://linux.die.net/man/2/fsync

This also plumbs CompressionUtils's "zip" function through
writeAtomically, so the code for handling atomic local filesystem
writes is all done in the same place.

* Remove unused import.

* Avoid FileOutputStream.

* Allow non-atomic writes to overwrite.

* Add some comments. And no need to flush an unbuffered stream.
2018-12-08 17:12:59 +01:00
Furkan KAMACI bbb283fa34 Double-checked locking bugs (#6662)
* Double-checked locking bug is fixed.

* @Nullable is removed since there is no need to use along with @MonotonicNonNull.

* Static import is removed.

* Lazy initialization is implemented.

* Local variables used instead of volatile ones.

* Local variables used instead of volatile ones.
2018-12-07 17:10:29 +01:00
Jihoon Son d525e5b18e Fix travis timeout in BufferHashGrouperTest (#6713)
* Fix travis timeout in BufferHashGrouperTest

* adjust buffer size

* adjust bufferSize and loadFactor

* increase memory

* add debug code

* cat error

* after script

* print logs

* print per 2 min

* use direct mem

* clean up
2018-12-07 12:05:27 +08:00
Atul Mohan ec36f0b82f Add default comparison to HavingSpecMetricComparator for custom Aggregator types (#6505)
* Add default comparison

* Switch to BigDecimal comparison

* Add comparator from AggFactory

* Fix indent

* Add tests
2018-12-04 13:35:13 -08:00
Clint Wylie a1c9d0add2 autosize processing buffers based on direct memory sizing by default (#6588)
* autosize processing buffers based on direct memory sizing

* remove oops, more test

* max 1gb autosize buffers, test, start of docs

* fix oops

* revert accidental change

* print buffer size in exception

* change the things
2018-12-03 18:40:02 -07:00
Roman Leventov ec38df7575
Simplify DruidNodeDiscoveryProvider; add DruidNodeDiscovery.Listener.nodeViewInitialized() (#6606)
* Simplify DruidNodeDiscoveryProvider; add DruidNodeDiscovery.Listener.nodeViewInitialized() method; prohibit and eliminate some suboptimal Java 8 patterns

* Fix style

* Fix HttpEmitterTest.timeoutEmptyQueue()

* Add DruidNodeDiscovery.Listener.nodeViewInitialized() calls in tests

* Clarify code
2018-12-01 01:12:56 +01:00
Mingming Qiu 849ba867b2 fix missing property in JsonTypeInfo of SegmentWriteOutMediumFactory (#6656) 2018-11-27 15:59:58 -08:00
Roman Leventov 887c645675
Find duplicate lines with checkstyle; enable some duplicate inspections in IntelliJ (#6558)
Not putting this to 0.13 milestone because the found bugs are not critical (one is a harmless DI config duplicate, and another is in a benchmark.

Change in `DumpSegment` is just an indentation change.
2018-11-26 16:55:42 +01:00
Roman Leventov 87b96fb1fd
Add checkstyle rules about imports and empty lines between members (#6543)
* Add checkstyle rules about imports and empty lines between members

* Add suppressions

* Update Eclipse import order

* Add empty line

* Fix StatsDEmitter
2018-11-20 12:42:15 +01:00
Gian Merlino fe69da0d95
Expressions: Fix improper supplier reuse with missing columns. (#6600)
* Expressions: Fix improper supplier reuse with missing columns.

ExpressionSelectors has an optimization that skips building a Map
when there is only one input supplier. However, this optimization
should not be used in the case where the is one input supplier but
more than one input identifier (which can happen when only one
input identifier corresponds to an actual column).

Fixes #6556.

* Add underscores to statics.
2018-11-15 22:13:32 -08:00
David Lim 7b41e23cbb
remove backpressure time from DefaultQueryMetrics pending on-going discussion (#6631) 2018-11-15 19:29:50 -07:00
Roman Leventov 8f3fe9cd02 Prohibit String.replace() and String.replaceAll(), fix and prohibit some toString()-related redundancies (#6607)
* Prohibit String.replace() and String.replaceAll(), fix and prohibit some toString()-related redundancies

* Fix bug

* Replace checkstyle regexp with IntelliJ inspection
2018-11-15 13:21:34 -08:00
Jihoon Son cdae2fe7b5 Deprecate IntervalChunkingQueryRunner (#6591)
* Deprecate IntervalChunkingQueryRunner

* add doc

* deprecate metric

* fix doc
2018-11-14 06:33:27 +08:00