Commit Graph

1892 Commits

Author SHA1 Message Date
Roman Leventov 63a897c278 Enable most IntelliJ 'Probable bugs' inspections (#4353)
* Enable most IntelliJ 'Probable bugs' inspections

* Fix in RemoteTestNG

* Fix IndexSpec's equals() and hashCode() to include longEncoding

* Fix inspection errors

* Extract global isntance of natural().nullsFirst(); address comments

* Fix

* Use noinspection comments instead of SuppressWarnings on method for IntelliJ-specific inspections

* Prohibit Ordering.natural().nullsFirst() using Checkstyle
2017-06-07 09:54:25 -07:00
Roman Leventov b487fa355b More methods in QueryMetrics and TopNQueryMetrics (the last part of #3798) (#4284)
* Add more methods to QueryMetrics and TopNQueryMetrics, add BitmapResultFactory

* Add implementor expectations section to BitmapResultFactory javadoc
2017-06-07 09:49:08 -07:00
kaijianding 551a89bd67 serialize DateTime As Long to improve json serde performance (#4038) 2017-06-06 10:08:51 -07:00
Gian Merlino d22db30db4 VirtualColumns: Block virtual columns with empty names. (#4367)
* VirtualColumns: Block virtual columns with empty names.

* Spelling.
2017-06-06 08:05:47 -07:00
Roman Leventov 31d33b333e Make using implicit system Charset an error (#4326)
* Make using implicit system charset an error

* Use StringUtils.toUtf8() and fromUtf8() instead of String.getBytes() and new String()

* Use English locale in StringUtils.safeFormat()

* Restore comment
2017-06-05 23:57:25 -07:00
Jonathan Wei b90c28e861 Support limit push down for GroupBy (#3873)
* Support limit push down for GroupBy V2

* Use orderBy spec ordering when applying limit push down

* PR Comments

* Remove unused var

* Checkstyle fixes

* Fix test

* Add comment on non-final variables, fix checkstyle

* Address PR comments

* PR comments

* Remove unnecessary buffer reset

* Fix missing @JsonProperty annotation
2017-06-02 15:39:04 -07:00
praveev 290ed3ab9d Make DateTime timezone aware (#4343)
* Make DateTime timezone aware

* Change unit tests to make DateTime timezone aware for PeriodGranularity
2017-06-02 12:45:52 -07:00
kaijianding 0efd18247b explicitly unmap hydrant files when abandonSegment to recycle mmap memory (#4341)
* fix TestKafkaExtractionCluster fail due to port already used

* explicitly unmap hydrant files when abandonSegment to recyle mmap memory

* address the comments

* apply to AppenderatorImpl
2017-06-01 18:15:30 -05:00
Roman Leventov 50e72c6aea Fix bugs (core) (#4339)
* Fix bugs

* Add test for GoogleDataSegmentPusher.buildPath()

* Exclude extension changes

* Address comments

* Brace
2017-06-02 06:47:59 +09:00
Roman Leventov 78179ef74d Inject QueryMetrics factories via PolyBind (#4336) 2017-05-31 09:07:03 -07:00
Kenji Noguchi 3400f601db Protobuf extension (#4039)
* move ProtoBufInputRowParser from processing module to protobuf extensions

* Ported PR #3509

* add DynamicMessage

* fix local test stuff that slipped in

* add license header

* removed redundant type name

* removed commented code

* fix code style

* rename ProtoBuf -> Protobuf

* pom.xml: shade protobuf classes, handle .desc resource file as binary file

* clean up error messages

* pick first message type from descriptor if not specified

* fix protoMessageType null check. add test case

* move protobuf-extension from contrib to core

* document: add new configuration keys, and descriptions

* update document. add examples

* move protobuf-extension from contrib to core (2nd try)

* touch

* include protobuf extensions in the distribution

* fix whitespace

* include protobuf example in the distribution

* example: create new pb obj everytime

* document: use properly quoted json

* fix whitespace

* bump parent version to 0.10.1-SNAPSHOT

* ignore Override check

* touch
2017-05-30 13:11:58 -07:00
Kamal Gurala dcb07d6958 Option to configure default analysis types in SegmentMetadataQuery (#4259)
* Option to configure default analysis types

* Updated Docs and renamed

* Added serde tests and Null handling

* Fixed Documentation

* Updated implementation

* Updated implementation

* Updated implementation

* Added usingDefaultIntervals in Builder

* Updated implementation

* Updated implementation and added failing test

* filterSegments implementation updated

* Updated imlementation

* Padding

* Add missing Override

* Updated implementation

* Fixed a naming bug

* Fixed bug

* Removed comment
2017-05-26 12:12:39 -07:00
Gian Merlino 1eaa7887bd Fix integer overflow in BufferGrouper. (#4333)
Would have led to out of bounds buffer access with large buffers.
Also added tests using large buffers.
2017-05-25 23:30:20 -07:00
Gian Merlino 2bd4c0930f Fix "quarter" granularity serialization. (#4316) 2017-05-23 10:06:17 -07:00
Gian Merlino 9283807ad7 GroupByQuery: Fix type-spanning comparisons. (#4317)
Jackson deserializes integers sometimes as int and sometimes as long,
depending on how big they are. This leads to ClassCastException
when comparing deserialized values as part of groupBy merging on the
broker.
2017-05-23 10:06:04 -07:00
Gian Merlino 22e5f52d00 Workaround for non-thread-safe use of CardinalityAggregator. (#4304) 2017-05-23 10:33:03 +09:00
Roman Leventov 8ec3a29af0 Don't pass QueryMetrics down in concurrent and async QueryRunners (fixes #4279) (#4288)
* Don't pass QueryMetrics down in concurrent and async QueryRunners

* Rename QueryPlus.threadSafe() to withoutThreadUnsafeState(); Update QueryPlus.withQueryMetrics() Javadocs; Fix generics in MetricsEmittingQueryRunner and CpuTimeMetricQueryRunner; Make DefaultQueryMetrics to fail fast on modifications from concurrent threads
2017-05-22 13:42:09 -05:00
Maksim Logvinenko d45dad2b44 Remove boxing/unboxing in indexer (#4269)
* Remove boxing/unboxing in indexer

* Fix rowIndex visibility

* Cleanup
2017-05-17 19:13:53 -05:00
Roman Leventov d9f423f55d Make QueryMetrics factories configurable (#4268)
* Ensure QueryMetrics factories accept Json ObjectMapper; Make QueryMetrics factories configurable

* Update QueryMetrics Javadocs

* Add javadocs to QueryMetrics factories

* Move queryMetricsFactory defaults to getter methods of config classes
2017-05-17 08:41:59 -07:00
Gian Merlino ddc2e68998 Remove cache keys from HavingSpecs. (#4280)
* Remove cache keys from HavingSpecs.

They weren't used, since they aren't part of the groupBy cache key.
Also, it's good that they weren't used, since many of them had
value truncation bugs.

* Fix imports.

* Fix test.
2017-05-16 22:13:02 -07:00
Roman Leventov d400f23791 Monomorphic processing of TopN queries with simple double aggregators over historical segments (part of #3798) (#4079)
* Monomorphic processing of topN queries with simple double aggregators and historical segments

* Add CalledFromHotLoop annocations to specialized methods in SimpleDoubleBufferAggregator

* Fix a bug in Historical1SimpleDoubleAggPooledTopNScannerPrototype

* Fix a bug in SpecializationService

* In SpecializationService, emit maxSpecializations warning only once

* Make GenericIndexed.theBuffer final

* Address comments

* Newline

* Reapply 439c906 (Make GenericIndexed.theBuffer final)

* Remove extra PooledTopNAlgorithm.capabilities field

* Improve CachingIndexed.inspectRuntimeShape()

* Fix CompressedVSizeIntsIndexedSupplier.inspectRuntimeShape()

* Don't override inspectRuntimeShape() in subclasses of CompressedVSizeIndexedInts

* Annotate methods in specializations of DimensionSelector and FloatColumnSelector with @CalledFromHotLoop

* Make ValueMatcher to implement HotLoopCallee

* Doc fix

* Fix inspectRuntimeShape() impl in ExpressionSelectors

* INFO logging of specialization events

* Remove modificator

* Fix OrFilter

* Fix AndFilter

* Refactor PooledTopNAlgorithm.scanAndAggregate()

* Small refactoring

* Add 'nothing to inspect' messages in empty HotLoopCallee.inspectRuntimeShape() implementations

* Don't care about runtime shape in tests

* Fix accessor bugs in Historical1SimpleDoubleAggPooledTopNScannerPrototype and HistoricalSingleValueDimSelector1SimpleDoubleAggPooledTopNScannerPrototype, cover them with tests

* Doc wording

* Address comments

* Remove MagicAccessorBridge and ensure Offset subclasses are public

* Attach error message to element
2017-05-16 16:19:55 -07:00
Roman Leventov b7a52286e8 Make @Override annotation obligatory (#4274)
* Make MissingOverride an error

* Make travis stript to fail fast

* Add missing Override annotations

* Comment
2017-05-16 13:30:30 -05:00
Himanshu 136b2fae72 improve query timeout handling and limit max scatter-gather bytes (#4229)
* improve query timeout handling and limit max scatter-gather bytes

* address review comments
2017-05-16 12:47:32 -05:00
Benedict Jin e823085866 Improve `collection` related things that reusing a immutable object instead of creating a new object (#4135) 2017-05-17 01:38:51 +09:00
Jihoon Son 50a4ec2b0b Add support for headers and skipping thereof for CSV and TSV (#4254)
* initial commit

* small fixes

* fix bug

* fix bug

* address code review

* more cr

* more cr

* more cr

* fix

* Skip head rows for CSV and TSV

* Move checking skipHeadRows to FileIteratingFirehose

* Remove checking null iterators

* Remove unused imports

* Address comments

* Fix compilation error

* Address comments

* Add more tests

* Add a comment to ReplayableFirehose

* Addressing comments

* Add docs and fix typos
2017-05-15 22:57:31 -07:00
Roman Leventov 1ebfa22955 Update Error prone configuration; Fix bugs (#4252)
* Make Errorprone the default compiler

* Address comments

* Make Error Prone's ClassCanBeStatic rule a error

* Preconditions allow only %s pattern

* Fix DruidCoordinatorBalancerTester

* Try to give the compiler more memory

* Remove distribution module activation on jdk 1.8 because only jdk 1.8 is used now

* Don't show compiler warnings

* Try different travis script

* Fix travis.yml

* Make Error Prone optional again

* For error-prone compiler

* Increase compiler's maxmem

* Don't run Error Prone for benchmarks because of OOM

* Skip install step in Travis

* Remove MetricHolder.writeToChannel()

* In travis.yml, check compilation before tests, because it may fail faster
2017-05-12 15:55:17 +09:00
Roman Leventov e09e892477 Refactor QueryRunner to accept QueryPlus: Query + QueryMetrics (part of #3798) (#4184)
* Add QueryPlus. Add QueryRunner.run(QueryPlus, Map) method with default implementation, to replace QueryRunner.run(Query, Map).

* Fix GroupByMergingQueryRunnerV2

* Fix QueryResourceTest

* Expand the comment to Query.run(walker, context)

* Remove legacy version of BySegmentSkippingQueryRunner.doRun()

* Add LegacyApiQueryRunnerTest and be more specific about legacy API removal plans in Druid 0.11 in Javadocs
2017-05-10 12:25:00 -07:00
Himanshu 462f6482df optionally add extensions to explicitly specified hadoopContainerClassPath (#4230)
* optionally add extensions to explicitly specified hadoopContainerClassPath

* note extensions always pushed in hadoop container when druid.extensions.hadoopContainerDruidClasspath is not provided explicitly
2017-05-08 14:24:14 -05:00
Roman Leventov 8277284d67 Add Checkstyle rule to force comments to classes and methods to be Javadoc comments (#4239) 2017-05-04 11:14:41 -07:00
Roman Leventov 5e85fcc0f5 Restore BaseQuery.computeOverridenContext() for compatibility (#4241) 2017-05-02 10:22:02 -07:00
Himanshu 5a5a2749cd improvements to coordinator lookups management (#3855)
* coordinator lookups mgmt improvements

* revert replaces removal, deprecate it instead

* convert and use older specs stored in db

* more tests and updates

* review comments

* add behavior for 0.10.0 to 0.9.2 downgrade

* incorporating more review comments

* remove explicit lock and use LifecycleLock in LookupReferencesManager. use LifecycleLock in LookupCoordinatorManager as well

* wip on LookupCoordinatorManager

* lifecycle lock

* refactor thread creation into utility method

* more review comments addressed

* support smooth roll back of lookup snapshots from 0.10.0 to 0.9.2

* correctly use LifecycleLock in LookupCoordinatorManager and remove synchronization from start/stop

* run lookup mgmt on leader coordinator only

* wip: changes to do multiple start() and stop() on LookupCoordinatorManager

* lifecycleLock fix usage in LookupReferencesManagerTest

* add LifecycleLock back

* fix license hdr

* some fixes

* make LookupReferencesManager.getAllLookupsState() consistent while still being lockless

* address review comments

* addressing leventov's comments

* address charle's comments

* add IOE.java

* for safety in LookupReferencesManager mainThread check for lifecycle started state on each loop in addition to interrupt

* move thread creation utility method to Execs

* fix names

* add tests for LookupCoordinatorManager.lookupManagementLoop()

* add further tests for figuring out toBeLoaded and toBeDropped on LookupCoordinatorManager

* address leventov comments

* remove LookupsStateWithMap and parameterize LookupsState

* address review comments

* address more review comments

* misc fixes
2017-04-28 08:41:38 -05:00
Roman Leventov b9fd30e90a Add Checkstyle check to prohibit IntelliJ-style commented code lines (#4220)
* Add Checkstyle check to prohibit IntelliJ-style commented code lines

* Address comment

* Restore issue link
2017-04-27 18:11:25 -07:00
kaijianding c47cfed0ec Significantly improve LongEncodingStrategy.AUTO build performance (#4215)
* Significantly improve LongEncodingStrategy.AUTO build performance

* use numInserted instead of tempIn.available

* fix bug
2017-04-27 15:11:07 +03:00
Roman Leventov ee9b5a619a Fix bugs in query builders and in TimeBoundaryQuery.getFilter() (#4131)
* Add queryMetrics property to Query interface; Fix bugs and removed unused code in Druids

* Fix a bug in TimeBoundaryQuery.getFilter() and remove TimeBoundaryQuery.getDimensionsFilter()

* Don't reassign query's queryMetrics if already present in CPUTimeMetricQueryRunner and MetricsEmittingQueryRunner

* Add compatibility constructor to BaseQuery

* Remove Query.queryMetrics property

* Move nullToNoopLimitSpec() method to LimitSpec interface

* Rename GroupByQuery.applyLimit() to postProcess(); Fix inconsistencies in GroupByQuery.Builder
2017-04-25 16:32:02 -05:00
kaijianding 336089563d skip rows which are added after cursor created (#4049)
* fix can't get dim value via IncrementalIndexStorageAdapter cursor

* address the comment

* add ut

* address ut comments

* fix bug and fix ut
2017-04-26 03:26:46 +09:00
Jonathan Wei 723a855ab9 Fix nested groupBys with outer exfns on inner numeric columns (#4182) 2017-04-21 19:47:46 -07:00
Gian Merlino 2ca7b00346 Update versions to 0.10.1-SNAPSHOT. (#4191) 2017-04-20 18:12:28 -07:00
Gian Merlino 60caa641f3 Restore backwards compatibility of Query. (#4185) 2017-04-19 19:47:50 +03:00
Jihoon Son 5b69f2eff2 Make timeout behavior consistent to document (#4134)
* Make timeout behavior consistent to document

* Refactoring BlockingPool and add more methods to QueryContexts

* remove unused imports

* Addressed comments

* Address comments

* remove unused method

* Make default query timeout configurable

* Fix test failure

* Change timeout from period to millis
2017-04-19 09:47:53 +09:00
Gian Merlino b2954d5fea Better groupBy error messages and docs around resource limits. (#4162)
* Better groupBy error messages and docs around resource limits.

* Fix BufferGrouper test from datasketches.

* Further clarify.
2017-04-13 10:38:53 -07:00
Ram iyer 2e9589215e removing unused var (#4163) 2017-04-13 04:03:41 +09:00
kaijianding 676af79044 don't do postAgg in TimeseriesQueryQueryToolChest when not necessary (#4155)
* don't do postAgg in TimeseriesQueryQueryToolChest when not necessary

* set postAggs to empty list in TimeseriesQueryQueryToolChest instead of checking finalizing fn

* fix ut

* fix ut again
2017-04-12 15:46:46 +05:30
Roman Leventov 15f3a94474 Copy closer into Druid codebase (fixes #3652) (#4153) 2017-04-10 09:38:45 +09:00
Roman Leventov 73d9b31664 GenericIndexed minor bug fixes, optimizations and refactoring (#3951)
* Minor bug fixes in GenericIndexed; Refactor and optimize GenericIndexed; Remove some unnecessary ByteBuffer duplications in some deserialization paths; Add ZeroCopyByteArrayOutputStream

* Fixes

* Move GenericIndexedWriter.writeLongValueToOutputStream() and writeIntValueToOutputStream() to SerializerUtils

* Move constructors

* Add GenericIndexedBenchmark

* Comments

* Typo

* Note in Javadoc that IntermediateLongSupplierSerializer, LongColumnSerializer and LongMetricColumnSerializer are thread-unsafe

* Use primitive collections in IntermediateLongSupplierSerializer instead of BiMap

* Optimize TableLongEncodingWriter

* Add checks to SerializerUtils methods

* Don't restrict byte order in SerializerUtils.writeLongToOutputStream() and writeIntToOutputStream()

* Update GenericIndexedBenchmark

* SerializerUtils.writeIntToOutputStream() and writeLongToOutputStream() separate for big-endian and native-endian

* Add GenericIndexedBenchmark.indexOf()

* More checks in methods in SerializerUtils

* Use helperBuffer.arrayOffset()

* Optimizations in SerializerUtils
2017-03-27 14:17:31 -05:00
Gian Merlino dd6c0ab509 Add SQL REGEXP_EXTRACT function; add "index" to "regex" extractionFn. (#4055)
* Add SQL REGEXP_EXTRACT function; add "index" to "regex" extractionFn.

* Fix tests.
2017-03-24 17:38:36 -07:00
Erik Dubbelboer 2cbc4764f8 Comparing dimensions to each other in a filter (#3928)
Comparing dimensions to each other using a select filter
2017-03-23 18:23:46 -07:00
Roman Leventov 4b5ae31207 QueryMetrics: abstraction layer of query metrics emitting (part of #3798) (#3954)
* QueryMetrics: abstraction layer of query metrics emitting

* Minor fixes

* QueryMetrics.emit() for bulk emit and improve Javadoc

* Fixes

* Fix

* Javadoc fixes

* Typo

* Use DefaultObjectMapper

* Add tests

* Address PR comments

* Remove QueryMetrics.userDimensions(); Rename QueryMetric.register() to report()

* Dedicated TopNQueryMetricsFactory, GroupByQueryMetricsFactory and TimeseriesQueryMetricsFactory

* Typo

* More elaborate Javadoc of QueryMetrics

* Formatting

* Replace QueryMetric enum with lambdas

* Add comments and VisibleForTesting annotations
2017-03-23 17:23:59 -07:00
Jonathan Wei 79f1a1d7f0 Allow float parameters for Bound/Selector/In filters on long columns (#4074)
* Allow float parameters for long filters

* Use BigDecimal intermediate form for string->long conversions

* PR comments

* PR comments
2017-03-23 14:18:05 -07:00
Akash Dwivedi ff7f90b02d relocate method in BufferAggregator. (#4071)
*  relocate method in BufferAggregator.

* Unused import.

* Detailed javadoc.

* using Int2ObjectMap.

* batch relocate.

* Revert batch relocate.

* Unused import.

* code comments.

* code comment.
2017-03-23 13:07:59 -07:00
David Lim f68ba4128f Exclude pagingIdentifiers that don't apply to a datasource (#4078)
* exclude pagingIdentifiers that don't apply to a datasource to support union datasources

* code review changes

* code review changes
2017-03-22 12:32:27 -07:00
Gian Merlino 1f48198607 Fix some query cache key collisions. (#4094)
The query caches generally store dimensions and aggregators positionally, so
appendCacheablesIgnoringOrder could lead to incorrect results being pulled
from the cache.
2017-03-22 11:08:48 -07:00
Gian Merlino 77b6213222 Remove unused Filters.getLongValueMatcher method. (#4086) 2017-03-21 13:46:07 -06:00
Gian Merlino ad477cb454 Fix topNs with extractionFns but no aggregators. (#4070)
The result sets were empty because of an aggs.length > 0 check. I'm not
sure if it was there for any good reason, but there didn't seem to be one.
2017-03-20 11:31:30 -07:00
Roman Leventov 84fe91ba0b Monomorphic processing of TopN queries with 1 and 2 aggregators (key part of #3798) (#3889)
* Monomorphic processing: add HotLoopCallee, CalledFromHotLoop, RuntimeShapeInspector, SpecializationService. Specialize topN queries with 1 or 2 aggregators. Add Cursor.advanceUninterruptibly() and isDoneOrInterrupted() for exception-free query processing.

* Use Execs.singleThreaded()

* RuntimeShapeInspector to support nullable fields

* Make CalledFromHotLoop annotation Inherited

* Remove unnecessary conversion of array of ColumnSelectorPluses to list and back to array in CardinalityAggregatorFactory

* Close InputStream in SpecializationService

* Formatting

* Test specialized PooledTopNScanners

* Set flags in PooledTopNAlgorithm directly

* Fix tests, dependent on CountAggragatorFactory toString() form

* Fix

* Revert CountAggregatorFactory changes

* Implement inspectRuntimeShape() for LongWrappingDimensionSelector and FloatWrappingDimensionSelector

* Remove duplicate RoaringBitmap dependency in the extendedset pom.xml

* Fix

* Treat ByteBuffers specially in StringRuntimeShape

* Doc fix

* Annotate BufferAggregator.init() with CalledFromHotLoop

* Make triggerSpecializationIterationsThreshold an int

* Remove SpecializationService.PerPrototypeClassState.of()

* Add comments

* Limit the amount of specializations that SpecializationService could make

* Add default implementation for BufferAggregator.inspectRuntimeShape(), for compatibility with extensions

* Use more efficient ConcurrentMap's idioms in SpecializationService
2017-03-17 14:44:36 -05:00
Gian Merlino 3ec1877887 Fix BucketExtractionFn on objects that are strings. (#4072) 2017-03-16 22:59:11 -07:00
Charles Allen 805d85afda Allow compilation as Java8 source and target (#3328)
* Allow compilation as Java8 source and target for everything except API

* Remove conditions in tests which assume that we may run with Java 7

* Update easymock to 3.4

* Make Animal Sniffer to check Java 1.8 usage; remove redundant druid-caffeine-cache configuration

* Use try-with-resources in LargeColumnSupportedComplexColumnSerializerTest.testSanity()

* Remove java7 special for druid-api
2017-03-14 22:23:47 -06:00
Gian Merlino e5c0dab12c groupBy v2: Better error message when resources are exhausted. (#4046)
* groupBy v2: Better error message when resources are exhausted.

Fixes #4043.

* Fix tests.
2017-03-15 00:37:49 +05:30
Jihoon Son dfe4bda7fd add doc (#4030) 2017-03-10 12:49:20 -08:00
Gian Merlino a5170666b6 groupBy v2: Always merge queries. (#4023)
This fixes #4020 because it means the timestamp will always be included for outermost
queries. Historicals receiving queries from older brokers will think they're
outermost (because CTX_KEY_OUTERMOST isn't set to "false"), so they'll include a
timestamp, so the older brokers will be OK.
2017-03-08 12:47:46 -06:00
Gian Merlino 4ca5270e88 Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods. (#4004)
* Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods.

Includes two fixes:
- groupBy v2 now ignores chunkPeriod, since it wouldn't have helped anyway (its mergeResults
returns a lazy sequence) and it generates incorrect results.
- Fix chunkPeriod handling for periods of irregular length, like "P1M" or "P1Y".

Also includes doc and test fixes:
- groupBy v1 was no longer being tested by GroupByQueryRunnerTest since #3953, now it
  is once again.
- chunkPeriod documentation was misleading due to its checkered past. Updated it to
  be more accurate.

* Remove unused import.

* Restore buffer size.
2017-03-06 12:27:02 -06:00
Gian Merlino 7b9e6c29cd Fix float, long dimension indexer object selectors. (#4012)
Their "convertUnsortedEncodedKeyComponentToActualArrayOrList" methods didn't respect the contract,
which says they should return single values (not array/list) if there is only a single value
to return. This affects the behavior of ObjectColumnSelectors on realtime segments.
2017-03-06 10:01:30 -08:00
Gian Merlino 337f3870d8 Fix TimeFormatExtractionFn getCacheKey when tz, locale are not provided. (#4007)
* Fix TimeFormatExtractionFn getCacheKey when tz, locale are not provided.

* Remove unused import.

* Use defaults in cache key.
2017-03-04 17:41:59 -08:00
praveev 67d0ae3271 Let toDateTime call fall through for Duration Granularity (#4001)
* Let toDateTime call fall through for Duration Granularity

Added test for the same.

* Add duration granularity test to GroupByQueryRunnerTest
2017-03-03 13:27:22 -06:00
Himanshu e7e3c2dc5a support singleThreaded flag for groupBy-v2 as well (#3992) 2017-03-03 23:43:06 +05:30
Roman Leventov 81a5f9851f TmpFileIOPeons to create files under the merging output directory, instead of java.io.tmpdir (#3990)
* In IndexMerger and IndexMergerV9, create temporary files under the output directory/tmpPeonFiles, instead of java.io.tmpdir

* Use FileUtils.forceMkdir() across the codebase and remove some unused code

* Fix test

* Fix PullDependencies.run()

* Unused import
2017-03-02 14:05:12 -08:00
Jonathan Wei 5fb1638534 Add default configuration for select query 'fromNext' parameter (#3986)
* Add default configuration for select query 'fromNext' parameter

* PR comments

* Fix PagingSpec config injection

* Injection fix for test
2017-03-01 17:05:35 -08:00
Himanshu 8316b4f48f fix TimeDimExtractionFn.apply() under concurrency (#3984) 2017-03-01 13:07:12 -08:00
kaijianding 772de66e79 add filenameBase to log when exceed file size limit to indicate which column it is (#3982) 2017-03-01 13:05:07 -08:00
Gian Merlino cc20133e70 Checkstyle rule to outlaw tabs. (#3988)
Tabs are the worst.
2017-02-28 23:52:53 -08:00
Akash Dwivedi 91344cbe57 Enable GenericIndexed V2 for built-in(druid-io managed) complex columns. (#3987)
* Enable GenericIndexed V2 for complex columns.

* SerializerBuilder to use  GenericColumnSerializer.
2017-02-28 22:06:54 -08:00
Jonathan Wei a08660a9ca Support ingestion of long/float dimensions (#3966)
* Support ingestion for long/float dimensions

* Allow non-arrays for key components in indexing type strategy interfaces

* Add numeric index merge test, fixes

* Docs for numeric dims at ingestion

* Remove unused import

* Adjust docs, add aggregate on numeric dims tests

* remove unused imports

* Throw exception for bitmap method on numerics

* Move typed selector creation to DimensionIndexer interface

* unused imports

* Fix

* Remove unused DimensionSpec from indexer methods, check for dims first in inc index storage adapter

* Remove spaces
2017-02-28 19:04:41 -08:00
praveev 5ccfdcc48b Fix testDeadlock timeout delay (#3979)
* No more singleton. Reduce iterations

* Granularities

* Fix the delay in the test

* Add license header

* Remove unused imports

* Lot more unused imports from all the rearranging

* CR feedback

* Move javadoc to constructor
2017-02-28 12:51:41 -06:00
praveev c3bf40108d One granularity (#3850)
* Refactor Segment Granularity

* Beginning of one granularity

* Copy the fix for custom periods in segment-grunalrity over here.

* Remove the custom serialization for now.

* Compilation cleanup

* Reformat code

* Fixing unit tests

* Unify to use a single iterable

* Backward compatibility for rolling upgrade

* Minor check style. Cosmetic changes.

* Rename length and millis to duration

* CR feedback

* Minor changes.
2017-02-25 01:02:29 -06:00
Jonathan Wei 58b704c3b4 Don't allow '__time' as a GroupBy output field name (#3967)
* Don't allow '__time' as a GroupBy column field name

* Tweak exception message
2017-02-23 14:39:17 -08:00
kaijianding 7ce05d58bc fix NPE in search query when dimension contains null value (#3968)
* fix NPE when dimension contains null value in search query

* add ut

* search with not existed dimension should always return empty result
2017-02-23 08:07:59 -08:00
Gian Merlino 372b84991c Add virtual columns to timeseries, topN, and groupBy. (#3941)
* Add virtual columns to timeseries, topN, and groupBy.

* Fix GroupByTimeseriesQueryRunnerTest.

* Updates from review comments.
2017-02-22 13:16:48 -08:00
Jihoon Son 7200dce112 Atomic merge buffer acquisition for groupBys (#3939)
* Atomic merge buffer acquisition for groupBys

* documentation

* documentation

* address comments

* address comments

* fix test failure

* Addressed comments

- Add InsufficientResourcesException
- Renamed GroupByQueryBrokerResource to GroupByQueryResource

* addressed comments

* Add takeBatch() to BlockingPool
2017-02-22 14:49:37 -06:00
Gian Merlino 985203b634 Finalize fields in postaggs (#3957)
* initial commits for finalizeFieldAccess #2433

* fix some bugs to run a query

* change name of method Queries.verifyAggregations to Queries.prepareAggregations

* add Uts

* fix Ut failures

* rebased to master

* address comments and add a Ut for arithmetic post aggregators

* rebased to the master

* address the comment of injection within arithmetic post aggregator

* address comments and introduce decorate() in the PostAggregator interface.

* Address comments. 1. Implements getComparator in FinalizingFieldAccessPostAggregator and add Uts for it 2. Some minor changes like renaming a method name.

* Fix a code style mismatch.

* Rebased to the master
2017-02-21 16:32:14 -08:00
Gian Merlino a47206eaf8 Ability to filter on virtual columns. (#3942)
This didn't need much other than having BitmapIndexSelector return null from
various methods to trigger cursor based filtering.
2017-02-21 16:03:31 -08:00
Jihoon Son 128274c6f0 Disable caching on brokers for groupBy v2 (#3950)
* Disable caching on brokers for groupBy v2

* Rename parameter

* address comments
2017-02-21 09:49:49 -08:00
Jonathan Wei bc33b68b51 Use GroupBy V2 as default (#3953)
* Use GroupBy V2 as default

* Remove unused line

* Change assert to exception propagation
2017-02-18 07:40:40 -08:00
kaijianding 361d9d9802 fix dynamic schema data can't rollup correctly (#3949)
* fix dynamic schema data can't rollup correctly

* add ut
2017-02-17 15:07:29 -06:00
Akash Dwivedi 797488a677 Removing Integer.MAX column size limit. (#3743)
* Removing Integer.MAX column size limit.

* On demand creation of headerLong, use v2 instead of v3

* Avoid reusing the same object from a previous test.

* Avoid reusing the same object from a previous test part#2

* code formatting.

* GenericIndexed/Writer code review changes.

* GenericIndexed/writer code review requested changes.

* checkIndex() to static

* native endianess for genericIndexedV2, code  review requested changes.

* Formatting

* Hll fix.

* use native endianess during bag size calculation.

* Code review requested changes.

* IOPeon close() changes.

* use different tmp directory path for testing.

* Code review requested changes.
2017-02-16 20:09:43 -06:00
Jihoon Son a459db68b6 Fine grained buffer management for groupby (#3863)
* Fine-grained buffer management for group by queries

* Remove maxQueryCount from GroupByRules

* Fix code style

* Merge master

* Fix compilation failure

* Address comments

* Address comments

- Revert Sequence
- Add isInitialized() to Grouper
- Initialize the grouper in RowBasedGrouperHelper.Accumulator
- Simple refactoring RowBasedGrouperHelper.Accumulator
- Add tests for checking the number of used merge buffers
- Improve docs

* Revert unnecessary changes

* change to visible to testing

* fix misspelling
2017-02-14 12:55:54 -08:00
Gian Merlino af67e8904e PreComputedHyperUniquesSerde: Fix formatting. (#3932) 2017-02-14 09:32:29 -08:00
DaimonPl a2875a4d91 pre-computed HLL support for hyperUnique aggregator (#3909) 2017-02-13 15:26:20 -08:00
Akash Dwivedi 8854ce018e File.deleteOnExit() (#3923)
* Less use of File.deleteOnExit()
 * removed deleteOnExit from most of the tests/benchmarks/iopeon
 * Made IOpeon closable

* Formatting.

* Revert DeterminePartitionsJobTest, remove cleanup method from IOPeon
2017-02-13 15:12:14 -08:00
Himanshu 9dfcf0763a disable javascript execution by default (#3818) 2017-02-13 15:11:18 -08:00
Pierre 9ab9feced6 Close all aggregators when closing onHeapIncrementalIndex (#3926)
* Close all aggregators when closing onHeapIncrementalIndex

* Aggregators are now handled as Closeables, remove unnecessary mock in test

* Fix variable shadowing
2017-02-13 15:01:27 -08:00
Jihoon Son 991e2852da Add PostAggregators to generator cache keys for top-n queries (#3899)
* Add PostAggregators to generator cache keys for top-n queries

* Add tests for strings

* Remove debug comments

* Add type keys and list sizes to cache key

* Make post aggregators used for sort are considered for cache key generation

* Use assertArrayEquals()

* Improve findPostAggregatorsForSort()

* Address comments

* fix test failure

* address comments
2017-02-13 12:23:44 -08:00
Parag Jain 33c635aff2 use as() method of base segment in reference counting segment (#3921) 2017-02-09 20:24:47 -06:00
Jonathan Wei ca2b04f0fd Add long/float ColumnSelectorStrategy implementations (#3838)
* Add long/float ColumnSelectorStrategy implementations

* Address PR comments

* Add String strategy with internal dictionary to V2 groupby, remove dict from numeric wrapping selectors, more tests

* PR comments

* Use BaseSingleValueDimensionSelector for long/float wrapping

* remove unused import

* Address PR comments

* PR comments

* PR comments

* More PR comments

* Fix failing calcite histogram subquery tests

* ScanQuery test and comment about isInputRaw

* Add outputType to extractionDimensionSpec, tweak SQL tests

* Fix limit spec optimization for numerics

* Add cardinality sanity checks to TopN

* Fix import from merge

* Add tests for filtered dimension spec outputType

* Address PR comments

* Allow filtered dimspecs on numerics

* More comments
2017-02-08 20:39:29 -08:00
Gian Merlino 97765fdfef Simplify LikeFilter implementation of getBitmapIndex, estimateSelectivity. (#3910)
* Simplify LikeFilter implementation of getBitmapIndex, estimateSelectivity.

LikeFilter:
- Reduce code duplication, and simplify methods, at the cost of incurring an extra box
  of ImmutableBitmap into a SingletonImmutableList. I think this is fine, since this
  should be cheap and the code path is not hot (just once per filter).

Filters:
- Make estimateSelectivity public since it seems intended that they be used by Filter
  implementations, and Filters from extensions may want to use them too. Removed
  @VisibleForTesting for the same reason.
- Rename one of the estimatePredicateSelectivity overloads to estimateSelectivity, since
  predicates aren't involved.

* Address PR comments.

* Remove unused import

* Change List to Collection
2017-02-08 13:46:01 -06:00
Gian Merlino 12317fd001 Bump version to 0.10.0-SNAPSHOT. (#3913) 2017-02-06 17:54:35 -08:00
Jihoon Son ddd8c9ef97 Add filter selectivity estimation for auto search strategy (#3848)
* Add filter selectivity estimation for auto search strategy

* Addressed comments

* Lazy bitmap materialization for bitmap sampling and java docs

* Addressed comments.

- Fix wrong non-overlap ratio computation and added unit tests.
- Change Iterable<Integer> to IntIterable
- Remove unnecessary Iterable<Integer>

* Addressed comments

- Split a long ternary operation into if-else blocks
- Add IntListUtils.fromTo()

* Fix test failure and add a test for RangeIntList

* fix code style

* Diabled selectivity estimation for multi-valued dimensions

* Address comment
2017-02-06 11:15:03 -08:00
Parag Jain 8a13a85765 Introduce SegmentizerFactory (#3901)
* Introduce SegmentizerFactory
- that knows how to deserialize specific type of segment
- Default implementation is MMappedQueryableSegmentizerFactory which creates QueryableIndexSegment
- Unit test for the default behavior

* review comments
2017-02-06 10:05:12 -08:00
DaimonPl 93b71e265e Extract HLL related code to separate module (#3900) 2017-02-03 09:45:11 -08:00
Jonathan Wei 182261f713 Allow configurable temp directory for query processing (#3893) 2017-02-02 10:22:28 -08:00
Jonathan Wei e6b95e80aa Remove deprecated Aggregator/AggregatorFactory methods (#3894) 2017-02-01 14:43:18 -08:00
Gian Merlino d3a3b7ba0c Add virtual column types, holder serde, and safety features. (#3823)
* Add virtual column types, holder serde, and safety features.

Virtual columns:
- add long, float, dimension selectors
- put cache IDs in VirtualColumnCacheHelper
- adjust serde so VirtualColumns can be the holder object for Jackson
- add fail-fast validation for cycle detection and duplicates
- add expression virtual column in core

Storage adapters:
- move virtual column hooks before checking base columns, to prevent surprises
  when a new base column is added that happens to have the same name as a
  virtual column.

* Fix ExtractionDimensionSpecs with virtual dimensions.

* Fix unused imports.

* CR comments

* Merge one more time, with feeling.
2017-01-26 18:15:51 -08:00
Roman Leventov 75d9e5e7a7 DimensionSelector-related bug fixes and optimizations (fixes #3799, part of #3798) (#3858)
*  * Add DimensionSelector.idLookup() and nameLookupPossibleInAdvance() to allow better inspection of features DimensionSelectors supports, and safer code working with DimensionSelectors in BaseTopNAlgorithm, BaseFilteredDimensionSpec, DimensionSelectorUtils;
 * Add PredicateFilteringDimensionSelector, to make BaseFilteredDimensionSpec to be able to decorate DimensionSelectors with unknown cardinality;
 * Add DimensionSelector.makeValueMatcher() (two kinds) for DimensionSelector-side specifics-aware optimization of ValueMatchers;
 * Optimize getRow() in BaseFilteredDimensionSpec's DimensionSelector, StringDimensionIndexer's DimensionSelector and SingleScanTimeDimSelector;
 * Use two static singletons, TrueValueMatcher and FalseValueMatcher, instead of BooleanValueMatcher;
 * Add NullStringObjectColumnSelector singleton and use it in MapVirtualColumn

* Rename DimensionSelectorUtils.makeNonDictionaryEncodedIndexedIntsBasedValueMatcher to makeNonDictionaryEncodedRowBasedValueMatcher

* Make ArrayBasedIndexedInts constructor private, replace it's usages with of() static factory method

* Cache baseIdLookup in ForwardingFilteredDimensionSelector

* Fix a bug in DimensionSelectorUtils.makeRowBasedValueMatcher(selector, predicate, matchNull)

* Employ precomputed BitSet optimization in DimensionSelector.makeValueMatcher(value, matchNull) when lookupId() is not available, but cardinality is known and lookupName() is available

* Doc fixes

* Addressed comments

* Fix

* Fix

* Adjust javadoc of DimensionSelector.nameLookupPossibleInAdvance() for SingleScanTimeDimSelector

* throw UnsupportedOperationException instead of IAE in BaseTopNAlgorithm
2017-01-25 15:28:27 -08:00
Gian Merlino 3136dfa421 LikeFilter: Read value lazily when doing a prefix-based match. (#3880)
This speeds up cases where we don't actually need to read the value,
such as "LIKE 'foo%'".
2017-01-25 13:22:07 -08:00
Roman Leventov af93a8d189 Sequences refactorings and removed unused code (part of #3798) (#3693)
* Removing unused code from io.druid.java.util.common.guava package; fix #3563 (more consistent and paranoiac resource handing in Sequences subsystem); Add Sequences.wrap() for DRY in MetricsEmittingQueryRunner, CPUTimeMetricQueryRunner and SpecificSegmentQueryRunner; Catch MissingSegmentsException in SpecificSegmentQueryRunner's yielder.next() method (follow up on #3617)

* Make Sequences.withEffect() execute the effect if the wrapped sequence throws exception from close()

* Fix strange code in MetricsEmittingQueryRunner

* Add comment on why YieldingSequenceBase is used in Sequences.withEffect()

* Use Closer in OrderedMergeSequence and MergeSequence to close multiple yielders
2017-01-19 20:07:43 -08:00
kaijianding 33ae9dd485 streaming version of select query (#3307)
* streaming version of select query

* use columns instead of dimensions and metrics;prepare for valueVector;remove granularity

* respect query limit within historical

* use constant

* fix thread name corrupted bug when using jetty qtp thread rather than processing thread while working with SpecificSegmentQueryRunner

* add some test for scan query

* add scan query document

* fix merge conflicts

* add compactedList resultFormat, this format is better for json ser/der

* respect query timeout

* respect query limit on broker

* use static consts and remove unused code
2017-01-19 16:09:53 -06:00
Slim 558dc365a4 renaming classes to be run by mvn and comment non operational tests (#3847) 2017-01-17 11:59:12 -08:00
Akash Dwivedi dd0c4e2ead Migrating extendedset from Metamarkets. (#3694)
* Migrating extendedset from Metamarkets.

* Notice change

* More details in NOTICE

* NOTICE formatting.

* suppress header checkstlye for extendedset.
2017-01-17 10:10:27 -08:00
Gian Merlino e86859b228 SQL support for nested groupBys. (#3806)
* SQL support for nested groupBys.

Allows, for example, doing exact count distinct by writing:

  SELECT COUNT(*) FROM (SELECT DISTINCT col FROM druid.foo)

Contrast with approximate count distinct, which is:

  SELECT COUNT(DISTINCT col) FROM druid.foo

* Add deeply-nested groupBy docs, tests, and maxQueryCount config.

* Extract magic constants into statics.

* Rework rules to put preconditions in the "matches" method.
2017-01-11 18:32:53 -08:00
Jihoon Son d80bec83cc Enable auto license checking (#3836)
* Enable license checking

* Clean duplicated license headers
2017-01-10 18:13:47 -08:00
Jihoon Son c099977a5b Add an option to SearchQuery to choose a search query execution strategy (#3792)
* Add an option to SearchQuery to choose a search query execution strategy.

Supported strategies are
1) Index-only query execution
2) Cursor-based scan
3) Auto: choose an efficient strategy for a given query

* Add SearchStrategy and SearchQueryExecutor

* Address comments

* Rename strategies and set UseIndexesStrategy as the default strategy

* Add a cost-based planner for auto strategy

* Add document

* Fix code style

* apply code style

* apply comments
2017-01-10 18:04:20 -08:00
Gian Merlino 3c63cff57a Remove makeMathExpressionSelector from ColumnSelectorFactory. (#3815)
* Remove makeMathExpressionSelector from ColumnSelectorFactory.

* Add @Nullable annotations in places, fix Number.class check.

* Break up createBindings, add tests.

* Add null check.
2017-01-05 18:06:38 -08:00
Gian Merlino 220ca7ebb6 Ignore DimFilterHavingSpec testConcurrentUsage. (#3814) 2017-01-03 17:43:58 -07:00
Gian Merlino d8702ebece Filters: Use ColumnSelectorFactory directly for building row-based matchers. (#3797)
* Filters: Use ColumnSelectorFactory directly for building row-based matchers.

* Adjustments based on code review.

- BoundDimFilter: fewer volatiles, rename matchesAnything to !matchesNothing.
- HavingSpecs: Clarify that they are not thread-safe, and make DimFilterHavingSpec
  not thread safe.
- Renamed rowType to rowSignature.
- Added specializations for time-based vs non-time-based DimensionSelector in RBCSF.
- Added convenience method DimensionHanderUtils.createColumnSelectorPlus.
- Added singleton ZeroIndexedInts.
- Added test cases for DimFilterHavingSpec.

* Make ValueMatcherColumnSelectorStrategy actually use the associated selector.

* Add RangeIndexedInts.

* DimFilterHavingSpec: Fix concurrent usage guard on jdk7.

* Add assertion to ZeroIndexedInts.

* Rename no-longer-volatile members.
2017-01-03 14:30:22 -08:00
Roman Leventov 33800122ad Don't return leaked Objects back to StupidPool, because this is dangerous. Reuse Cleaners in StupidPool. Make StupidPools named. Add StupidPool.leakedObjectCount(). Minor fixes (#3631) 2016-12-26 00:35:35 -06:00
Jonathan Wei 0e5bd8b4d4 Add dimension type-based interface for query processing (#3570)
* Add dimension type-based interface for query processing

* PR comment changes

* Address PR comments

* Use getters for QueryDimensionInfo

* Split DimensionQueryHelper into base interface and query-specific interfaces

* Treat empty rows as nulls in v2 groupby

* Reduce boxing in SearchQueryRunner

* Add GroupBy empty row handling to MultiValuedDimensionTest

* Address PR comments

* PR comments and refactoring

* More PR comments

* PR comments
2016-12-21 20:11:37 -07:00
Jonathan Wei 2bfcc8a592 First and Last Aggregator (#3566)
* add first and last aggregator

* add test and fix

* moving around

* separate aggregator valueType

* address PR comment

* add finalize inner query and adjust v1 inner indexing

* better test and fixes

* java-util import fixes

* PR comments

* Add first/last aggs to ITWikipediaQueryTest
2016-12-16 15:26:40 -08:00
Himanshu ed322a4beb remove size from default analysisTypes list for segmentMetadata query (#3773) 2016-12-13 18:01:21 -08:00
Jonathan Wei 880a021a7a Fix missed travis failures from PR 3567 and 2798 (#3761)
* Fix checkstyle failures from PR 3567

* Fix GranularityPathSpecTest compile failure
2016-12-07 19:07:31 -08:00
Erik Dubbelboer bb9e35e1af Add Greatest and Least post aggregations (#3567) 2016-12-07 17:58:23 -08:00
Roman Leventov dc8f814acc Optimize Iterator<ImmutableBitmap> implementation inside Filters.matchPredicate() so that it doesn't emit empty bitmap in the end of the iteration, and make it to follow Iterator contract, that is throw NoSuchElementException from next() if there are no more bitmaps (#3754) 2016-12-07 12:54:09 -08:00
Jonathan Wei d1896a2d62 Disable flush after every ObjectMapper write (#3748) 2016-12-06 16:45:23 -08:00
Gian Merlino b1bac9f2d3 groupBy v2: Ignore timestamp completely when granularity = all, except for the final merge. (#3740)
* GroupByBenchmark: Add serde, spilling, all-gran benchmarks.

Also use more iterations.

* groupBy v2: Ignore timestamp completely when granularity = all, except for the final merge.

Specifically:

- Remove timestamp from RowBasedKey when not needed
- Set timestamp to null in MapBasedRows that are not part of the final merge.
2016-12-06 16:17:32 -08:00
Himanshu 45da7e48f1 groupBy sort results by (dimensions,timestamp) instead of (timestamp,dimension) (#3672)
* sortByDimsFirst flag for groupBy query

* Remove need for KeyType in Grouper<KeyType> to be Comparable<KeyType>

* fix review comments

* fix review comments regarding removing code duplication of dim/time comparison

* move comparator for KeyType object to KeySerdeFactory so that creation of comparator does not need KeySerde

* remove unnecessary system.out.println

* make access static var NATURAL_NULLS_FIRST directly

* further review comments addressing
2016-12-06 09:48:56 -08:00
Navis Ryu c74d267f50 Support virtual column for select query (#2511)
* Support virtual column for select query

* Addressed comments
2016-12-05 15:14:35 -08:00
Gian Merlino b64e06704e Fix SingleScanTimeDimSelector when an extractionFn returns null for a timestamp. (#3732) 2016-12-02 15:27:54 -08:00
Gian Merlino f4cc8c2b2f IndexBuilder: Close IncrementalIndex when done. (#3734) 2016-12-02 16:56:34 -06:00
Gian Merlino 353fee79dd Add "asMillis" option to "timeFormat" extractionFn. (#3733)
This is useful for chaining extractionFns that all want to treat time as millis,
such as having a javascript extractionFn after a timeFormat.
2016-12-02 13:45:16 -08:00
Gian Merlino 102375d9bb Add "strlen" extractionFn. (#3731) 2016-12-02 12:08:51 -08:00
Gian Merlino 4c5d10f8a3 Add DimFilterHavingSpec. (#3727)
* Add DimFilterHavingSpec.

* Add test for DimFilterHavingSpec with extractionFns.
2016-12-02 10:04:30 -08:00
Gian Merlino 68735829ca Add, fix equals, hashCode, toString on various classes. (#3723)
* TimeFormatExtractionFn: Add toString.

* InDimFilter: Add toString, allow accepting any Collection of values.

* DimensionTopNMetricSpec: Fix toString.

* InvertedTopNMetricSpec: Add toString.

* HyperUniqueFinalizingPostAggregator: Add equals, hashCode, toString.
2016-11-30 19:00:14 -08:00
Gian Merlino 477e0cab7c Filter fixes and tests (#3724)
* More robust Filter tests.

All Filter tests now exercise the CNF and post-filtering features.

* Fixes to RowBasedValueMatcherFactory and to bound filters.

- Change Comparables to Strings in ValueMatcher related code.
- Break out RowBasedValueMatcherFactory, fix a variety of issues around nulls, and add tests.
- Fix bound filters on long columns with non-numeric bounds, and add tests.
2016-11-30 16:10:05 -08:00
Gian Merlino 6922d684bf GroupBy: Validation of output names, and a gross hack for v1 subqueries. (#3686)
v1 subqueries try to use aggregators to "transfer" values from the inner
results to an incremental index, but aggregators can't transfer all kinds of
values (strings are a common one). This is a workaround that selectively
ignores what the outer aggregators ask for and instead assumes that we know
best.

These are in the same commit because the name validation changed the kinds of
errors that were thrown by v1 subqueries.
2016-11-29 12:35:03 +05:30
Roman Leventov c070b4a816 Fix concurrency defects, remove unnecessary volatiles (#3701) 2016-11-22 16:42:28 -08:00
Roman Leventov 7b56cec3b9 Fix resource leaks (#3702) 2016-11-18 21:21:36 +05:30
Gian Merlino 7e80d1045a Exercise v2 engine in the groupBy aggregator and multi-value dimension tests. (#3698)
This also involved some other test changes:

- Added a factory.mergeRunners step to AggregationTestHelper's groupBy chain, since the v2
  engine does merging there.
- Changed test byteBuffer pools from on-heap to off-heap to work around
  https://github.com/DataSketches/sketches-core/pull/116 for datasketches tests.
2016-11-16 20:02:25 -08:00
Keuntae Park 094f5b851b Support Min/Max for Timestamp (#3299)
* Min/Max aggregator for Timestamp

* remove unused imports and method

* rebase and zip the test data

* add docs
2016-11-14 23:00:21 -08:00
Gian Merlino 9ad34a3f03 groupBy v1: Force all dimensions to strings. (#3685)
Fixes #3683.
2016-11-14 09:30:18 -08:00
Jisoo Kim 7c0f462fbc fix bug in StringDimensionHandler and add a cli tool for validating segments (#3666) 2016-11-11 18:46:25 -08:00
Roman Leventov fbbb55f867 Update emitter dependency to 0.4.0 and emit "version" dimension for all druid metrics (#3679)
* Update emitter dependency to 0.4.0 and emit "version" dimension for all druid metrics, not only query metrics

* Remove unused imports

* Use empty string instead of "testing-version" as a version placeholder
2016-11-11 17:17:27 -06:00
Akash Dwivedi 3e408497b3 Migrating bytebuffercollections from Metamarkets. (#3647)
* Migrating  bytebuffercollections from Metamarkets.

* resolving code conflicts and removing <p> from bytebuffer-collections.
2016-11-11 10:51:07 -08:00
Gian Merlino fd5451486c Short-circuiting AndFilter. (#3676)
If any of the bitmaps are empty, the result will be false.
2016-11-11 10:14:56 -08:00
Gian Merlino 657e4512d2 Checkstyle checks for AvoidStaticImport, UnusedImports. (#3660)
Excludes tests from AvoidStaticImport, since those are used often there and
I didn't want to make this changeset too large. Production code use was minimal
and I switched those to non-static imports.
2016-11-05 11:34:36 -07:00
Gian Merlino 4cbebd0931 SubstringDimExtractionFn, BoundDimFilter: Implement typical style toString. (#3658) 2016-11-04 13:31:47 -07:00
Gian Merlino 600bbd4a17 BucketExtractionFn: Implement hashCode, fix toString. (#3656) 2016-11-04 11:24:02 -07:00
Gian Merlino 8b3c86f41f Fix FilteredAggregatorFactory toString formatting. (#3657) 2016-11-04 11:23:55 -07:00
Gian Merlino 2c504b6258 Add "like" filter. (#3642)
* Add "like" filter.

* Addressed some PR comments.

* Slight simplifications to LikeFilter.

* Additional simplifications.

* Fix comment in LikeFilter.

* Clarify comment in LikeFilter.

* Simplify LikeMatcher a bit.

* No use going through the optimized path if prefix is empty.

* Add more tests.
2016-11-04 23:25:03 +05:30
Navis Ryu b99e14e732 Support configuration for handling multi-valued dimension (#2541)
* Support configuration for handling multi-valued dimension

* Addressed comments

* use MultiValueHandling.ofDefault() for missing policy
2016-11-03 22:38:54 -06:00
Navis Ryu e10def32f2 Support string type in math expression (#2836)
* Support string type in math expression

addressed comments

addressed comments

Addressed comments

* Updated math function document

* Addressed comments
2016-11-02 21:10:48 -06:00
kaijianding 2961406b90 fix zero period in PeriodGranularity causing gran.iterable(start, end) infinite loop (#3644) 2016-11-02 15:40:07 +05:30
Roman Leventov 4b0d6cf789 Fix resource leaks (ComplexColumn and GenericColumn) (#3629)
* Remove unused ComplexColumnImpl class

* Remove throws IOException from close() in GenericColumn, ComplexColumn, IndexedFloats and IndexedLongs

* Use concise try-with-resources syntax in several places

* Fix resource leaks (ComplexColumn and GenericColumn) in SegmentAnalyzer, SearchQueryRunner, QueryableIndexIndexableAdapter and QueryableIndexStorageAdapter

* Use Closer in Iterable, returned from QueryableIndexIndexableAdapter.getRows(), in order to try to close everything even if closing some parts thew exceptions
2016-11-02 09:23:52 +05:30
Gian Merlino 45940d6e40 Math expressions support for missing columns. (#3630)
Also add SchemaEvolutionTest to help test this kind of thing.

Fixes #3627 and includes test for #3625.
2016-11-01 09:40:25 -07:00
Gian Merlino 89d9c61894 Deprecate Aggregator.getName and AggregatorFactory.getAggregatorStartValue. (#3572) 2016-10-31 15:24:30 -07:00
Navis Ryu 3fca3be9ea SpecificSegmentQueryRunner misses missing segments from toYielder() (#3617) 2016-10-30 11:47:29 -07:00
Himanshu 23a8e22836 fix SketchMergeAggregatorFactory.finalizeResults, comparator and more UTs for timeseries, topN (#3613) 2016-10-28 15:48:33 -07:00
Navis Ryu 898c1c21af More best-effort parse long (#3603)
* More best-effort parse long

* addressed comments
2016-10-25 10:31:51 -07:00
Akash Dwivedi 4b3bd8bd63 Migrating java-util from Metamarkets. (#3585)
* Migrating java-util from Metamarkets.

* checkstyle and updated license on java-util files.

* Removed unused imports from whole project.

* cherry pick metamx/java-util@826021f.

* Copyright changes on java-util pom, address review comments.
2016-10-21 14:57:07 -07:00
Navis Ryu 8b7ff4409a Math expressional parameters for aggregator (#2783)
* Supports expression-paramed aggregator (squashed and rebased on master) also includes math post aggregator (was #2820)

* Addressed comments

* addressed comments
2016-10-19 13:58:35 -05:00
Roman Leventov b113a34355 In CPUTimeMetricQueryRunner, account CPU consumed in baseSequence.toYielder() (#3587) 2016-10-18 09:06:42 -05:00
Charles Allen 2c5c8198db Make query/cpu/time still report on error (#3535) 2016-10-18 08:26:21 -05:00
Roman Leventov 9611358f0a Small topn scan improvements (#3526)
* Remove unused numProcessed param from PooledTopNAlgorithm.aggregateDimValue()

* Replace AtomicInteger with simple int in PooledTopNAlgorithm.scanAndAggregate() and aggregateDimValue()

* Remove unused import
2016-10-17 10:36:19 -07:00
Gian Merlino 285516bede Workaround non-thread-safe use of HLL aggregators. (#3578)
Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used
by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and
"get" methods can be called simultaneously by OnheapIncrementalIndex, since its
"doAggregate" and "getMetricObjectValue" methods are not synchronized.

This means that the optimization of HyperLogLogCollector.fold in #3314 (saving and
restoring position rather than duplicating the storage buffer of the right-hand side)
could cause corruption in the face of concurrent writes.

This patch works around the issue by duplicating the storage buffer in "get" before
returning a collector. The returned collector still shares data with the original one,
but the situation is no worse than before #3314. In the future we may want to consider
making a thread safe version of HLLC that avoids these kinds of problems in realtime
indexing. But for now I thought it was best to do a small change that restored the old
behavior.
2016-10-17 09:39:12 -07:00
Roman Leventov 5dc95389f7 Add Checkstyle framework (#3551)
* Add Checkstyle framework

* Avoid star import

* Need braces for control flow statements

* Redundant imports

* Add NewLineAtEndOfFile check
2016-10-13 13:37:47 -07:00
Roman Leventov 85ac8eff90 Improve performance of IndexMergerV9 (#3440)
* Improve performance of StringDimensionMergerV9 and StringDimensionMergerLegacy by avoiding primitive int boxing by using IntIterator in IndexedInts instead of Iterator<Integer>; Extract some common logic for V9 and Legacy mergers; Minor improvements to resource handling in StringDimensionMergerV9

* Don't mask index in MergeIntIterator.makeQueueElement()

* DRY conversion RoaringBitmap's IntIterator to fastutil's IntIterator

* Do implement skip(n) in IntIterators extending AbstractIntIterator because original implementation is not reliable

* Use Test(expected=Exception.class) instead of try { } catch (Exception e) { /* ignore */ }
2016-10-13 08:28:46 -07:00
Charles Allen 76e77cb610 Make segment creation gauva 14 friendly (#3520) 2016-10-05 15:25:03 -07:00
Gian Merlino 40f2fe7893 Bump versions to 0.9.3-SNAPSHOT (#3524) 2016-09-29 13:53:32 -07:00
Charles Allen 654e1db309 Add simple test to FunctionalExtractionTest (#3522) 2016-09-28 23:45:15 -07:00
Gian Merlino d5a8a35fec groupBy: GroupByRowProcessor fixes, invert subquery context overrides. (#3502)
- Fix GroupByRowProcessor config overrides
- Fix GroupByRowProcessor resource limit checking
- Invert subquery context overrides such that for the subquery, its own
  keys override keys from the outer query, not the other way around.

The last bit is necessary for the test to work, and seems like a better
way to do it anyway.
2016-09-23 14:41:09 -07:00
Gian Merlino 7195be32d8 groupBy v2: Fix dangling references. (#3500)
Acquiring references in the processing task prevents dangling references
caused by canceled processing tasks.
2016-09-24 01:59:11 +05:30
Gian Merlino f8d71fc602 groupBy: Fix maxMergingDictionarySize config. (#3488) 2016-09-22 10:02:33 -07:00
Gian Merlino c87ecea975 Fix ListFilteredDimensionSpec blacklisting on non-present values. (#3487) 2016-09-22 09:12:02 -07:00
Navis Ryu 49c0fe0e8b Show candidate hosts for the given query (#2282)
* Show candidate hosts for the given query

* Added test cases & minor changes to address comments

* Changed path-param to query-pram for intervals/numCandidates
2016-09-22 08:32:38 -07:00
Keuntae Park 54ec4dd584 Support renaming of outputName for cached select and search query results (#3395)
* support renaming of outputName for cached select and search queries

* rebase and resolve conflicts

* rollback CacheStrategy interface change

* updated based on review comments
2016-09-20 08:19:14 -07:00
Charles Allen 95e08b38ea [QTL] Reduced Locking Lookups (#3071)
* Lockless lookups

* Fix compile problem

* Make stack trace throw instead

* Remove non-germane change

* * Add better naming to cache keys. Makes logging nicer
* Fix #3459

* Move start/stop lock to non-interruptable for readability purposes
2016-09-16 11:54:23 -07:00
Jonathan Wei df766b2bbd Add dimension handling interface for ingestion and segment creation (#3217)
* Add dimension handling interface for ingestion and segment creation

* update javadocs for DimensionHandler/DimensionIndexer

* Move IndexIO row validation into DimensionHandler

* Fix null column skipping in mergerV9

* Add deprecation note for 'numeric_dims' filename pattern in IndexIO v8->v9 conversion

* Fix java7 test failure
2016-09-12 12:54:02 -07:00
Gian Merlino d108461838 groupBy v2: Parallel disk spilling. (#3433)
In ConcurrentGrouper, when it becomes clear that disk spilling is necessary, switch
from hash-based partitioning to thread-based partitioning. This stops processing
threads from blocking each other while spilling is occurring.
2016-09-09 16:49:58 -06:00
Gian Merlino 1e3f94237e groupBy v2: Configurable load factor. (#3437)
Also change defaults:

- bufferGrouperMaxLoadFactor from 0.75 to 0.7.
- maxMergingDictionarySize to 100MB from 25MB, should be more appropriate
  for most heaps.
2016-09-07 14:14:59 -05:00
Roman Leventov 4f0bcdce36 Eager file unmapping in IndexIO, IndexMerger and IndexMergerV9 (#3422)
* Eager file unmapping in IndexIO, IndexMerger and IndexMergerV9. The exact purpose for this change is to allow running IndexMergeBenchmark in Windows, however should also be universally 'better' than non-deterministic unmapping, done when MappedByteBuffers are garbage-collected (BACKEND-312)

* Use Closer with a proper pattern in IndexIO, IndexMerger and IndexMergerV9

* Unmap file in IndexMergerV9.makeInvertedIndexes() using try-with-resources

* Reformat IndexIO
2016-09-07 10:43:47 -07:00
Gian Merlino 8d2ae144a8 groupBy: Short-circuit identity preCompute manipulators. (#3434) 2016-09-06 22:28:44 -06:00
Gian Merlino 1d07964987 LimitedTemporaryStorage: Fix perf bug. (#3432)
FilterOutputStream has an inefficient implementation of write(byte[], int, int).
So let's extend OutputStream directly and use efficient implementations of all
methods.
2016-09-06 15:39:36 -07:00
Gian Merlino 8ed1894488 groupBy: Omit timestamp from merge key when granularity = all. (#3416)
Fixes #3412.
2016-09-01 09:02:54 -07:00
Gian Merlino 6d25c5e053 Avoid materializing all groupBy results with order + limit. (#3410)
The old TopNFunction code did Sequences.toList on the input sequence before
using a priority queue to find the top N items. Now, the priority queue
is used in an accumulator, so there is no need to fully materialize the results.

Also removed equals/hashCode from the limitFn and remove limitFn from the
GroupByQuery's hashCode, since that wasn't necessary and the implementation
of hashCode wasn't correct anyway.
2016-08-31 14:08:07 -07:00
Gian Merlino 1268e2902c Add groupBy test for multiple multi-value dimensions. (#3415) 2016-08-31 11:21:10 -07:00
Gian Merlino e9050c2b4c TimeFormatExtractionFn: Allow null formats (equivalent to ISO8601) and granular bucketing. (#3411) 2016-08-31 20:58:53 +05:30
Keuntae Park 0076b5fc1a Interval bug fix for search query (#2903)
* support query granularity and interval for search query

* skip unncessary bitmap calculation when query interval contains whole the data interval of the given segments.

* use binary search to find start and end index for the given interval

* fix based on comment

* bug fix based on the review comments and add unit tests
2016-08-31 20:52:44 +05:30
Dave Li c4e8440c22 Adds long compression methods (#3148)
* add read

* update deprecated guava calls

* add write and vsizeserde

* add benchmark

* separate encoding and compression

* add header and reformat

* update doc

* address PR comment

* fix buffer order

* generate benchmark files

* separate encoding strategy and format

* fix benchmark

* modify supplier write to channel

* add float NONE handling

* address PR comment

* address PR comment 2
2016-08-30 16:17:46 -07:00
Jonathan Wei 4e91330a17 Use DimensionSpec in CardinalityAggregatorFactory (#3406)
* Use DimensionSpec in CardinalityAggregatorFactory

* Address PR comments

* Fix requiredFields()
2016-08-30 15:54:02 -07:00
Gian Merlino b11e9544ea GroupBy v2: Improve hash code distribution. (#3407)
Without this transformation, distribution of hash % X is poor in general.
It is catastrophically poor when X is a multiple of 31 (many slots would
be empty).
2016-08-30 12:09:08 +05:30
kaijianding f037dfcaa4 fix missing segments duplicate retried (#3398) 2016-08-29 23:46:21 +05:30
jaehong choi 2e0f253c32 introducing lists of existing columns in the fields of select queries' output (#2491)
* introducing lists of existing columns in the fields of select queries' output

* rebase master

* address the comment. add test code for select query caching

* change the cache code in SelectQueryQueryToolChest to 0x16
2016-08-25 21:37:53 +05:30
rajk-tetration 362b9266f8 Adding filters for TimeBoundary on backend (#3168)
* Adding filters for TimeBoundary on backend

Signed-off-by: Balachandar Kesavan <raj.ksvn@gmail.com>

* updating TimeBoundaryQuery constructor in QueryHostFinderTest

* add filter helpers

* update filterSegments + test

* Conditional filterSegment depending on whether a filter exists

* Style changes

* Trigger rebuild

* Adding documentation for timeboundaryquery filtering

* added filter serialization to timeboundaryquery cache

* code style changes
2016-08-15 10:25:24 -07:00
Gian Merlino e1b0b7de3e IndexBuilder: Allow replacing rows, customizable maxRows. (#3359) 2016-08-12 15:22:45 -07:00
Jonathan Wei 454587857c Make StringComparator deserialization case-insensitive (#3356) 2016-08-11 18:00:11 -07:00
Himanshu 043562914d Update IncrementalIndex.getMetricType() to return type name stored by ComplexMetricsSerde instead of AggregatorFactory.getTypeName() (#3341) 2016-08-10 11:03:44 -07:00
Gian Merlino 1eb7a7e882 Restore optimizations in BoundFilter. (#3343) 2016-08-10 08:53:17 -07:00
Gian Merlino a2bcd97512 IncrementalIndex: Fix multi-value dimensions returned from iterators. (#3344)
They had arrays as values, which MapBasedRow doesn't understand and
toStrings rather than converting to lists.
2016-08-10 08:47:29 -07:00
Jonathan Wei 890e3bdd3f More informative query unit test names (#3342) 2016-08-09 22:24:48 -07:00
Gian Merlino 8899affe48 Introduce standardized "Resource limit exceeded" error. (#3338)
Fixes #3336.
2016-08-09 10:50:56 -07:00
Gian Merlino 21bce96c4c More useful query errors. (#3335)
Follow-up to #1773, which meant to add more useful query errors but
did not actually do so. Since that patch, any error other than
interrupt/cancel/timeout was reported as `{"error":"Unknown exception"}`.

With this patch, the error fields are:

- error, one of the specific strings "Query interrupted", "Query timeout",
  "Query cancelled", or "Unknown exception" (same behavior as before).
- errorMessage, the message of the topmost non-QueryInterruptedException
  in the causality chain.
- errorClass, the class of the topmost non-QueryInterruptedException
  in the causality chain.
- host, the host that failed the query.
2016-08-09 16:14:52 +08:00
Gian Merlino 1aae5bd67d Nicer handling for cancelled groupBy v2 queries. (#3330)
1. Wrap temporaryStorage in a resource holder, to avoid spurious "Closed"
   errors from already-running processing tasks.
2. Exit early from the merging accumulator if the query is cancelled.
2016-08-05 14:48:06 -07:00
Jonathan Wei decefb7477 Add time interval dim filter and retention analysis example (#3315)
* Add time interval dim filter and retention analysis example

* Use closed-open matching for intervals, update cache key generation

* Fix time filtering tests for interval boundary change
2016-08-05 07:25:04 -07:00
Navis Ryu 5b3f0ccb1f Support variance and standard deviation (#2525)
* Support variance and standard deviation

* addressed comments
2016-08-04 17:32:58 -07:00