Commit Graph

1722 Commits

Author SHA1 Message Date
Roman Leventov d400f23791 Monomorphic processing of TopN queries with simple double aggregators over historical segments (part of #3798) (#4079)
* Monomorphic processing of topN queries with simple double aggregators and historical segments

* Add CalledFromHotLoop annocations to specialized methods in SimpleDoubleBufferAggregator

* Fix a bug in Historical1SimpleDoubleAggPooledTopNScannerPrototype

* Fix a bug in SpecializationService

* In SpecializationService, emit maxSpecializations warning only once

* Make GenericIndexed.theBuffer final

* Address comments

* Newline

* Reapply 439c906 (Make GenericIndexed.theBuffer final)

* Remove extra PooledTopNAlgorithm.capabilities field

* Improve CachingIndexed.inspectRuntimeShape()

* Fix CompressedVSizeIntsIndexedSupplier.inspectRuntimeShape()

* Don't override inspectRuntimeShape() in subclasses of CompressedVSizeIndexedInts

* Annotate methods in specializations of DimensionSelector and FloatColumnSelector with @CalledFromHotLoop

* Make ValueMatcher to implement HotLoopCallee

* Doc fix

* Fix inspectRuntimeShape() impl in ExpressionSelectors

* INFO logging of specialization events

* Remove modificator

* Fix OrFilter

* Fix AndFilter

* Refactor PooledTopNAlgorithm.scanAndAggregate()

* Small refactoring

* Add 'nothing to inspect' messages in empty HotLoopCallee.inspectRuntimeShape() implementations

* Don't care about runtime shape in tests

* Fix accessor bugs in Historical1SimpleDoubleAggPooledTopNScannerPrototype and HistoricalSingleValueDimSelector1SimpleDoubleAggPooledTopNScannerPrototype, cover them with tests

* Doc wording

* Address comments

* Remove MagicAccessorBridge and ensure Offset subclasses are public

* Attach error message to element
2017-05-16 16:19:55 -07:00
Roman Leventov b7a52286e8 Make @Override annotation obligatory (#4274)
* Make MissingOverride an error

* Make travis stript to fail fast

* Add missing Override annotations

* Comment
2017-05-16 13:30:30 -05:00
Himanshu 136b2fae72 improve query timeout handling and limit max scatter-gather bytes (#4229)
* improve query timeout handling and limit max scatter-gather bytes

* address review comments
2017-05-16 12:47:32 -05:00
Benedict Jin e823085866 Improve `collection` related things that reusing a immutable object instead of creating a new object (#4135) 2017-05-17 01:38:51 +09:00
Jihoon Son 50a4ec2b0b Add support for headers and skipping thereof for CSV and TSV (#4254)
* initial commit

* small fixes

* fix bug

* fix bug

* address code review

* more cr

* more cr

* more cr

* fix

* Skip head rows for CSV and TSV

* Move checking skipHeadRows to FileIteratingFirehose

* Remove checking null iterators

* Remove unused imports

* Address comments

* Fix compilation error

* Address comments

* Add more tests

* Add a comment to ReplayableFirehose

* Addressing comments

* Add docs and fix typos
2017-05-15 22:57:31 -07:00
Roman Leventov 1ebfa22955 Update Error prone configuration; Fix bugs (#4252)
* Make Errorprone the default compiler

* Address comments

* Make Error Prone's ClassCanBeStatic rule a error

* Preconditions allow only %s pattern

* Fix DruidCoordinatorBalancerTester

* Try to give the compiler more memory

* Remove distribution module activation on jdk 1.8 because only jdk 1.8 is used now

* Don't show compiler warnings

* Try different travis script

* Fix travis.yml

* Make Error Prone optional again

* For error-prone compiler

* Increase compiler's maxmem

* Don't run Error Prone for benchmarks because of OOM

* Skip install step in Travis

* Remove MetricHolder.writeToChannel()

* In travis.yml, check compilation before tests, because it may fail faster
2017-05-12 15:55:17 +09:00
Roman Leventov e09e892477 Refactor QueryRunner to accept QueryPlus: Query + QueryMetrics (part of #3798) (#4184)
* Add QueryPlus. Add QueryRunner.run(QueryPlus, Map) method with default implementation, to replace QueryRunner.run(Query, Map).

* Fix GroupByMergingQueryRunnerV2

* Fix QueryResourceTest

* Expand the comment to Query.run(walker, context)

* Remove legacy version of BySegmentSkippingQueryRunner.doRun()

* Add LegacyApiQueryRunnerTest and be more specific about legacy API removal plans in Druid 0.11 in Javadocs
2017-05-10 12:25:00 -07:00
Himanshu 462f6482df optionally add extensions to explicitly specified hadoopContainerClassPath (#4230)
* optionally add extensions to explicitly specified hadoopContainerClassPath

* note extensions always pushed in hadoop container when druid.extensions.hadoopContainerDruidClasspath is not provided explicitly
2017-05-08 14:24:14 -05:00
Roman Leventov 8277284d67 Add Checkstyle rule to force comments to classes and methods to be Javadoc comments (#4239) 2017-05-04 11:14:41 -07:00
Roman Leventov 5e85fcc0f5 Restore BaseQuery.computeOverridenContext() for compatibility (#4241) 2017-05-02 10:22:02 -07:00
Himanshu 5a5a2749cd improvements to coordinator lookups management (#3855)
* coordinator lookups mgmt improvements

* revert replaces removal, deprecate it instead

* convert and use older specs stored in db

* more tests and updates

* review comments

* add behavior for 0.10.0 to 0.9.2 downgrade

* incorporating more review comments

* remove explicit lock and use LifecycleLock in LookupReferencesManager. use LifecycleLock in LookupCoordinatorManager as well

* wip on LookupCoordinatorManager

* lifecycle lock

* refactor thread creation into utility method

* more review comments addressed

* support smooth roll back of lookup snapshots from 0.10.0 to 0.9.2

* correctly use LifecycleLock in LookupCoordinatorManager and remove synchronization from start/stop

* run lookup mgmt on leader coordinator only

* wip: changes to do multiple start() and stop() on LookupCoordinatorManager

* lifecycleLock fix usage in LookupReferencesManagerTest

* add LifecycleLock back

* fix license hdr

* some fixes

* make LookupReferencesManager.getAllLookupsState() consistent while still being lockless

* address review comments

* addressing leventov's comments

* address charle's comments

* add IOE.java

* for safety in LookupReferencesManager mainThread check for lifecycle started state on each loop in addition to interrupt

* move thread creation utility method to Execs

* fix names

* add tests for LookupCoordinatorManager.lookupManagementLoop()

* add further tests for figuring out toBeLoaded and toBeDropped on LookupCoordinatorManager

* address leventov comments

* remove LookupsStateWithMap and parameterize LookupsState

* address review comments

* address more review comments

* misc fixes
2017-04-28 08:41:38 -05:00
Roman Leventov b9fd30e90a Add Checkstyle check to prohibit IntelliJ-style commented code lines (#4220)
* Add Checkstyle check to prohibit IntelliJ-style commented code lines

* Address comment

* Restore issue link
2017-04-27 18:11:25 -07:00
kaijianding c47cfed0ec Significantly improve LongEncodingStrategy.AUTO build performance (#4215)
* Significantly improve LongEncodingStrategy.AUTO build performance

* use numInserted instead of tempIn.available

* fix bug
2017-04-27 15:11:07 +03:00
Roman Leventov ee9b5a619a Fix bugs in query builders and in TimeBoundaryQuery.getFilter() (#4131)
* Add queryMetrics property to Query interface; Fix bugs and removed unused code in Druids

* Fix a bug in TimeBoundaryQuery.getFilter() and remove TimeBoundaryQuery.getDimensionsFilter()

* Don't reassign query's queryMetrics if already present in CPUTimeMetricQueryRunner and MetricsEmittingQueryRunner

* Add compatibility constructor to BaseQuery

* Remove Query.queryMetrics property

* Move nullToNoopLimitSpec() method to LimitSpec interface

* Rename GroupByQuery.applyLimit() to postProcess(); Fix inconsistencies in GroupByQuery.Builder
2017-04-25 16:32:02 -05:00
kaijianding 336089563d skip rows which are added after cursor created (#4049)
* fix can't get dim value via IncrementalIndexStorageAdapter cursor

* address the comment

* add ut

* address ut comments

* fix bug and fix ut
2017-04-26 03:26:46 +09:00
Jonathan Wei 723a855ab9 Fix nested groupBys with outer exfns on inner numeric columns (#4182) 2017-04-21 19:47:46 -07:00
Gian Merlino 2ca7b00346 Update versions to 0.10.1-SNAPSHOT. (#4191) 2017-04-20 18:12:28 -07:00
Gian Merlino 60caa641f3 Restore backwards compatibility of Query. (#4185) 2017-04-19 19:47:50 +03:00
Jihoon Son 5b69f2eff2 Make timeout behavior consistent to document (#4134)
* Make timeout behavior consistent to document

* Refactoring BlockingPool and add more methods to QueryContexts

* remove unused imports

* Addressed comments

* Address comments

* remove unused method

* Make default query timeout configurable

* Fix test failure

* Change timeout from period to millis
2017-04-19 09:47:53 +09:00
Gian Merlino b2954d5fea Better groupBy error messages and docs around resource limits. (#4162)
* Better groupBy error messages and docs around resource limits.

* Fix BufferGrouper test from datasketches.

* Further clarify.
2017-04-13 10:38:53 -07:00
Ram iyer 2e9589215e removing unused var (#4163) 2017-04-13 04:03:41 +09:00
kaijianding 676af79044 don't do postAgg in TimeseriesQueryQueryToolChest when not necessary (#4155)
* don't do postAgg in TimeseriesQueryQueryToolChest when not necessary

* set postAggs to empty list in TimeseriesQueryQueryToolChest instead of checking finalizing fn

* fix ut

* fix ut again
2017-04-12 15:46:46 +05:30
Roman Leventov 15f3a94474 Copy closer into Druid codebase (fixes #3652) (#4153) 2017-04-10 09:38:45 +09:00
Roman Leventov 73d9b31664 GenericIndexed minor bug fixes, optimizations and refactoring (#3951)
* Minor bug fixes in GenericIndexed; Refactor and optimize GenericIndexed; Remove some unnecessary ByteBuffer duplications in some deserialization paths; Add ZeroCopyByteArrayOutputStream

* Fixes

* Move GenericIndexedWriter.writeLongValueToOutputStream() and writeIntValueToOutputStream() to SerializerUtils

* Move constructors

* Add GenericIndexedBenchmark

* Comments

* Typo

* Note in Javadoc that IntermediateLongSupplierSerializer, LongColumnSerializer and LongMetricColumnSerializer are thread-unsafe

* Use primitive collections in IntermediateLongSupplierSerializer instead of BiMap

* Optimize TableLongEncodingWriter

* Add checks to SerializerUtils methods

* Don't restrict byte order in SerializerUtils.writeLongToOutputStream() and writeIntToOutputStream()

* Update GenericIndexedBenchmark

* SerializerUtils.writeIntToOutputStream() and writeLongToOutputStream() separate for big-endian and native-endian

* Add GenericIndexedBenchmark.indexOf()

* More checks in methods in SerializerUtils

* Use helperBuffer.arrayOffset()

* Optimizations in SerializerUtils
2017-03-27 14:17:31 -05:00
Gian Merlino dd6c0ab509 Add SQL REGEXP_EXTRACT function; add "index" to "regex" extractionFn. (#4055)
* Add SQL REGEXP_EXTRACT function; add "index" to "regex" extractionFn.

* Fix tests.
2017-03-24 17:38:36 -07:00
Erik Dubbelboer 2cbc4764f8 Comparing dimensions to each other in a filter (#3928)
Comparing dimensions to each other using a select filter
2017-03-23 18:23:46 -07:00
Roman Leventov 4b5ae31207 QueryMetrics: abstraction layer of query metrics emitting (part of #3798) (#3954)
* QueryMetrics: abstraction layer of query metrics emitting

* Minor fixes

* QueryMetrics.emit() for bulk emit and improve Javadoc

* Fixes

* Fix

* Javadoc fixes

* Typo

* Use DefaultObjectMapper

* Add tests

* Address PR comments

* Remove QueryMetrics.userDimensions(); Rename QueryMetric.register() to report()

* Dedicated TopNQueryMetricsFactory, GroupByQueryMetricsFactory and TimeseriesQueryMetricsFactory

* Typo

* More elaborate Javadoc of QueryMetrics

* Formatting

* Replace QueryMetric enum with lambdas

* Add comments and VisibleForTesting annotations
2017-03-23 17:23:59 -07:00
Jonathan Wei 79f1a1d7f0 Allow float parameters for Bound/Selector/In filters on long columns (#4074)
* Allow float parameters for long filters

* Use BigDecimal intermediate form for string->long conversions

* PR comments

* PR comments
2017-03-23 14:18:05 -07:00
Akash Dwivedi ff7f90b02d relocate method in BufferAggregator. (#4071)
*  relocate method in BufferAggregator.

* Unused import.

* Detailed javadoc.

* using Int2ObjectMap.

* batch relocate.

* Revert batch relocate.

* Unused import.

* code comments.

* code comment.
2017-03-23 13:07:59 -07:00
David Lim f68ba4128f Exclude pagingIdentifiers that don't apply to a datasource (#4078)
* exclude pagingIdentifiers that don't apply to a datasource to support union datasources

* code review changes

* code review changes
2017-03-22 12:32:27 -07:00
Gian Merlino 1f48198607 Fix some query cache key collisions. (#4094)
The query caches generally store dimensions and aggregators positionally, so
appendCacheablesIgnoringOrder could lead to incorrect results being pulled
from the cache.
2017-03-22 11:08:48 -07:00
Gian Merlino 77b6213222 Remove unused Filters.getLongValueMatcher method. (#4086) 2017-03-21 13:46:07 -06:00
Gian Merlino ad477cb454 Fix topNs with extractionFns but no aggregators. (#4070)
The result sets were empty because of an aggs.length > 0 check. I'm not
sure if it was there for any good reason, but there didn't seem to be one.
2017-03-20 11:31:30 -07:00
Roman Leventov 84fe91ba0b Monomorphic processing of TopN queries with 1 and 2 aggregators (key part of #3798) (#3889)
* Monomorphic processing: add HotLoopCallee, CalledFromHotLoop, RuntimeShapeInspector, SpecializationService. Specialize topN queries with 1 or 2 aggregators. Add Cursor.advanceUninterruptibly() and isDoneOrInterrupted() for exception-free query processing.

* Use Execs.singleThreaded()

* RuntimeShapeInspector to support nullable fields

* Make CalledFromHotLoop annotation Inherited

* Remove unnecessary conversion of array of ColumnSelectorPluses to list and back to array in CardinalityAggregatorFactory

* Close InputStream in SpecializationService

* Formatting

* Test specialized PooledTopNScanners

* Set flags in PooledTopNAlgorithm directly

* Fix tests, dependent on CountAggragatorFactory toString() form

* Fix

* Revert CountAggregatorFactory changes

* Implement inspectRuntimeShape() for LongWrappingDimensionSelector and FloatWrappingDimensionSelector

* Remove duplicate RoaringBitmap dependency in the extendedset pom.xml

* Fix

* Treat ByteBuffers specially in StringRuntimeShape

* Doc fix

* Annotate BufferAggregator.init() with CalledFromHotLoop

* Make triggerSpecializationIterationsThreshold an int

* Remove SpecializationService.PerPrototypeClassState.of()

* Add comments

* Limit the amount of specializations that SpecializationService could make

* Add default implementation for BufferAggregator.inspectRuntimeShape(), for compatibility with extensions

* Use more efficient ConcurrentMap's idioms in SpecializationService
2017-03-17 14:44:36 -05:00
Gian Merlino 3ec1877887 Fix BucketExtractionFn on objects that are strings. (#4072) 2017-03-16 22:59:11 -07:00
Charles Allen 805d85afda Allow compilation as Java8 source and target (#3328)
* Allow compilation as Java8 source and target for everything except API

* Remove conditions in tests which assume that we may run with Java 7

* Update easymock to 3.4

* Make Animal Sniffer to check Java 1.8 usage; remove redundant druid-caffeine-cache configuration

* Use try-with-resources in LargeColumnSupportedComplexColumnSerializerTest.testSanity()

* Remove java7 special for druid-api
2017-03-14 22:23:47 -06:00
Gian Merlino e5c0dab12c groupBy v2: Better error message when resources are exhausted. (#4046)
* groupBy v2: Better error message when resources are exhausted.

Fixes #4043.

* Fix tests.
2017-03-15 00:37:49 +05:30
Jihoon Son dfe4bda7fd add doc (#4030) 2017-03-10 12:49:20 -08:00
Gian Merlino a5170666b6 groupBy v2: Always merge queries. (#4023)
This fixes #4020 because it means the timestamp will always be included for outermost
queries. Historicals receiving queries from older brokers will think they're
outermost (because CTX_KEY_OUTERMOST isn't set to "false"), so they'll include a
timestamp, so the older brokers will be OK.
2017-03-08 12:47:46 -06:00
Gian Merlino 4ca5270e88 Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods. (#4004)
* Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods.

Includes two fixes:
- groupBy v2 now ignores chunkPeriod, since it wouldn't have helped anyway (its mergeResults
returns a lazy sequence) and it generates incorrect results.
- Fix chunkPeriod handling for periods of irregular length, like "P1M" or "P1Y".

Also includes doc and test fixes:
- groupBy v1 was no longer being tested by GroupByQueryRunnerTest since #3953, now it
  is once again.
- chunkPeriod documentation was misleading due to its checkered past. Updated it to
  be more accurate.

* Remove unused import.

* Restore buffer size.
2017-03-06 12:27:02 -06:00
Gian Merlino 7b9e6c29cd Fix float, long dimension indexer object selectors. (#4012)
Their "convertUnsortedEncodedKeyComponentToActualArrayOrList" methods didn't respect the contract,
which says they should return single values (not array/list) if there is only a single value
to return. This affects the behavior of ObjectColumnSelectors on realtime segments.
2017-03-06 10:01:30 -08:00
Gian Merlino 337f3870d8 Fix TimeFormatExtractionFn getCacheKey when tz, locale are not provided. (#4007)
* Fix TimeFormatExtractionFn getCacheKey when tz, locale are not provided.

* Remove unused import.

* Use defaults in cache key.
2017-03-04 17:41:59 -08:00
praveev 67d0ae3271 Let toDateTime call fall through for Duration Granularity (#4001)
* Let toDateTime call fall through for Duration Granularity

Added test for the same.

* Add duration granularity test to GroupByQueryRunnerTest
2017-03-03 13:27:22 -06:00
Himanshu e7e3c2dc5a support singleThreaded flag for groupBy-v2 as well (#3992) 2017-03-03 23:43:06 +05:30
Roman Leventov 81a5f9851f TmpFileIOPeons to create files under the merging output directory, instead of java.io.tmpdir (#3990)
* In IndexMerger and IndexMergerV9, create temporary files under the output directory/tmpPeonFiles, instead of java.io.tmpdir

* Use FileUtils.forceMkdir() across the codebase and remove some unused code

* Fix test

* Fix PullDependencies.run()

* Unused import
2017-03-02 14:05:12 -08:00
Jonathan Wei 5fb1638534 Add default configuration for select query 'fromNext' parameter (#3986)
* Add default configuration for select query 'fromNext' parameter

* PR comments

* Fix PagingSpec config injection

* Injection fix for test
2017-03-01 17:05:35 -08:00
Himanshu 8316b4f48f fix TimeDimExtractionFn.apply() under concurrency (#3984) 2017-03-01 13:07:12 -08:00
kaijianding 772de66e79 add filenameBase to log when exceed file size limit to indicate which column it is (#3982) 2017-03-01 13:05:07 -08:00
Gian Merlino cc20133e70 Checkstyle rule to outlaw tabs. (#3988)
Tabs are the worst.
2017-02-28 23:52:53 -08:00
Akash Dwivedi 91344cbe57 Enable GenericIndexed V2 for built-in(druid-io managed) complex columns. (#3987)
* Enable GenericIndexed V2 for complex columns.

* SerializerBuilder to use  GenericColumnSerializer.
2017-02-28 22:06:54 -08:00