druid

Commit Graph

Author	SHA1	Message	Date
Shiv Toolsidass	5a894f830b	Added backpressure metric (#6335 ) * Added backpressure metric * Updated channelReadable to AtomicBoolean and fixed broken test * Moved backpressure metric logic to NettyHttpClient * Fix placement of calculating backPressureDuration	2018-09-29 14:24:04 -07:00
Jihoon Son	f09e718c68	Implement MapVirtualColumn.makeDimensionSelector properly (#6396 ) * Implement MapVirtualColumn.makeDimensionSelector properly * address comments	2018-09-29 14:13:05 -07:00
Jihoon Son	faf3f1e426	Fix cache keys of DefaultDimensionSpec and ExtractionDimensionSpec (#6390 )	2018-09-26 20:08:53 -07:00
Nishant Bangarwa	c9d281a2e9	Add ability to pass in Bloom filter from Hive Queries (#6222 ) * Bloom filter initial implementation fix checkstyle review comments Fix wierd failure review comments Revert "Fix wierd failure" This reverts commit a13a83ad7887e679f6d539191b52aeaaea85b613. * fix test * review comment	2018-09-26 16:04:26 -07:00
Jonathan Wei	00b0a156e9	Tweak isInvalidRows behavior in HadoopTuningConfig (#6339 ) * Tweak isInvalidRows behavior in HadoopTuningConfig * Fix tests	2018-09-24 16:13:13 -07:00
Alexander Saydakov	93345064b5	HllSketch module (#5712 ) * HllSketch module * updated license and imports * updated package name * implemented makeAggregateCombiner() * removed json marks * style fix * added module * removed unnecessary import, side effect of package renaming * use TreadLocalRandom * addressing code review points, mostly formatting and comments * javadoc * natural order with nulls * typo * factored out raw input value extraction * singleton * style fix * style fix * use Collections.singletonList instead of Arrays.asList * suppress warning	2018-09-24 08:41:56 -07:00
Jonathan Wei	609da01882	Fix dictionary ID race condition in IncrementalIndexStorageAdapter (#6340 ) Possibly related to https://github.com/apache/incubator-druid/issues/4937 -------- There is currently a race condition in IncrementalIndexStorageAdapter that can lead to exceptions like the following, when running queries with filters on String dimensions that hit realtime tasks: ``` org.apache.druid.java.util.common.ISE: id[5] >= maxId[5] at org.apache.druid.segment.StringDimensionIndexer$1IndexerDimensionSelector.lookupName(StringDimensionIndexer.java:591) at org.apache.druid.segment.StringDimensionIndexer$1IndexerDimensionSelector$2.matches(StringDimensionIndexer.java:562) at org.apache.druid.segment.incremental.IncrementalIndexStorageAdapter$IncrementalIndexCursor.advance(IncrementalIndexStorageAdapter.java:284) ``` When the `filterMatcher` is created in the constructor of `IncrementalIndexStorageAdapter.IncrementalIndexCursor`, `StringDimensionIndexer.makeDimensionSelector` gets called eventually, which calls: ``` final int maxId = getCardinality(); ... @Override public int getCardinality() { return dimLookup.size(); } ``` So `maxId` is set to the size of the dictionary at the time that the `filterMatcher` is created. However, the `maxRowIndex` which is meant to prevent the Cursor from returning rows that were added after the Cursor was created (see https://github.com/apache/incubator-druid/pull/4049) is set after the `filterMatcher` is created. If rows with new dictionary values are added after the `filterMatcher` is created but before `maxRowIndex` is set, then it is possible for the Cursor to return rows that contain the new values, which will have `id >= maxId`. This PR sets `maxRowIndex` before creating the `filterMatcher` to prevent rows with unknown dictionary IDs from being passed to the `filterMatcher`. ----------- The included test triggers the error with a custom Filter + DruidPredicateFactory. The DimensionSelector for predicate-based filter matching is created here in `Filters.makeValueMatcher`: ``` public static ValueMatcher makeValueMatcher( final ColumnSelectorFactory columnSelectorFactory, final String columnName, final DruidPredicateFactory predicateFactory ) { final ColumnCapabilities capabilities = columnSelectorFactory.getColumnCapabilities(columnName); // This should be folded into the ValueMatcherColumnSelectorStrategy once that can handle LONG typed columns. if (capabilities != null && capabilities.getType() == ValueType.LONG) { return getLongPredicateMatcher( columnSelectorFactory.makeColumnValueSelector(columnName), predicateFactory.makeLongPredicate() ); } final ColumnSelectorPlus<ValueMatcherColumnSelectorStrategy> selector = DimensionHandlerUtils.createColumnSelectorPlus( ValueMatcherColumnSelectorStrategyFactory.instance(), DefaultDimensionSpec.of(columnName), columnSelectorFactory ); return selector.getColumnSelectorStrategy().makeValueMatcher(selector.getSelector(), predicateFactory); } ``` The test Filter adds a row to the IncrementalIndex in the test when the predicateFactory creates a new String predicate, after `DimensionHandlerUtils.createColumnSelectorPlus` is called.	2018-09-18 10:43:29 +04:00
Roman Leventov	0c4bd2b57b	Prohibit some Random usage patterns (#6226 ) * Prohibit Random usage patterns * Fix FlattenJSONBenchmarkUtil	2018-09-14 13:35:51 -07:00
Roman Leventov	d50b69e6d4	Prohibit LinkedList (#6112 ) * Prohibit LinkedList * Fix tests * Fix * Remove unused import	2018-09-13 18:07:06 -07:00
Gian Merlino	d6cbdf86c2	Broker backpressure. (#6313 ) * Broker backpressure. Adds a new property "druid.broker.http.maxQueuedBytes" and a new context parameter "maxQueuedBytes". Both represent a maximum number of bytes queued per query before exerting backpressure on the channel to the data server. Fixes #4933. * Fix query context doc.	2018-09-10 09:33:29 -07:00
Himanshu	d61f708ef5	make COMPLEX column optionally filterable in Druid code (#6223 ) * make COMPLEX column filterable in Druid code * Revert "make COMPLEX column filterable in Druid code" This reverts commit `9fc6ec768c`. * complex columns can be optionally made filterable * some types are always filterable * add ColumnCapabilitiesImpl serde tests * add SuppresedWarnings annotation	2018-09-05 12:28:49 -07:00
Gian Merlino	be6c901114	Like filter: Fix escapes escaping themselves. (#6295 ) Escapes should escape themselves.	2018-09-05 09:29:07 -07:00
Gian Merlino	431d3d8497	Rename io.druid to org.apache.druid. (#6266 ) * Rename io.druid to org.apache.druid. * Fix META-INF files and remove some benchmark results. * MonitorsConfig update for metrics package migration. * Reorder some dimensions in inner queries for some reason. * Fix protobuf tests.	2018-08-30 09:56:26 -07:00
Himanshu	1fae6513e1	add "subtotalsSpec" attribute to groupBy query (#5280 ) * add subtotalsSpec attribute to groupBy query * dont sent subtotalsSpec to downstream nodes from broker and other updates * address review comment * fix checkstyle issues after merge to master * add docs for subtotalsSpec feature * address doc review comments	2018-08-28 17:46:38 -07:00
Dayue Gao	fcf8c8d53c	RowBasedKeySerde should use empty dictionary in constructor (#6256 )	2018-08-28 17:22:18 -07:00
Gian Merlino	4a8b09b6a9	Fix NPE on constant null numeric expressions. (#6232 ) The bug was caused by makeExprEvalSelector returning a null object, which it isn't supposed to do. Fixed this by renaming ConstantColumnValueSelector to ConstantExprEvalSelector (it was only used for ExprEval anyway) and putting logic in that class to make sure the selectors behave as expected.	2018-08-27 15:30:56 -07:00
Gian Merlino	71c1a70ff6	FilteredBufferAggregator: Fix missing relocate, isNull methods. (#6233 )	2018-08-27 15:30:45 -07:00
Gian Merlino	157e75a1fe	Minor followup to #6220 . (#6231 ) Adjustments to comments and usage of generics.	2018-08-27 12:01:44 -05:00
Gian Merlino	cb40b6d369	Fix all inspection errors currently reported. (#6236 ) * Fix all inspection errors currently reported. TeamCity builds on master are reporting inspection errors, possibly because there was a while where it was not running due to the Apache migration, and there was some drift. * Fix one more location. * Fix tests. * Another fix.	2018-08-26 18:36:01 -06:00
Gian Merlino	23ba6f7ad7	Fix four bugs with numeric dimension output types. (#6220 ) * Fix four bugs with numeric dimension output types. This patch includes the following bug fixes: - TopNColumnSelectorStrategyFactory: Cast dimension values to the output type during dimExtractionScanAndAggregate instead of updateDimExtractionResults. This fixes a bug where, for example, grouping on doubles-cast-to-longs would fail to merge two doubles that should have been combined into the same long value. - TopNQueryEngine: Use DimExtractionTopNAlgorithm when treating string columns as numeric dimensions. This fixes a similar bug: grouping on string-cast-to-long would fail to merge two strings that should have been combined. - GroupByQuery: Cast numeric types to the expected output type before comparing them in compareDimsForLimitPushDown. This fixes #6123. - GroupByQueryQueryToolChest: Convert Jackson-deserialized dimension values into the proper output type. This fixes an inconsistency between results that came from cache vs. not-cache: for example, Jackson sometimes deserializes integers as Integers and sometimes as Longs. And the following code-cleanup changes, related to the fixes above: - DimensionHandlerUtils: Introduce convertObjectToType, compareObjectsAsType, and converterFromTypeToType to make it easier to handle casting operations. - TopN in general: Rename various "dimName" variables to "dimValue" where they actually represent dimension values. The old names were confusing. * Remove unused imports.	2018-08-25 14:31:46 -07:00
Himanshu	a76bf9ab2a	add ability to do optional rollup in AggregationTestHelper (#6213 )	2018-08-22 16:38:36 -07:00
Benedict Jin	3647d4c94a	Make time-related variables more readable (#6158 ) * Make time-related variables more readable * Patch some improvements from the code reviewer * Remove unnecessary boxing of Long type variables	2018-08-21 15:29:40 -07:00
Kirill Kozlov	62e580050c	Use JUnit TemporaryFolder rule instead of system temp folder (#6070 ) * Use JUnit TemporaryFolder rule instead of system tmp folder * Allow to forbid apis which present not in all mvn modules	2018-08-16 11:05:45 -07:00
Jihoon Son	ecee3e0a24	Further optimize memory for Travis jobs (#6150 ) * Further optimize memory for Travis jobs * fix build * sudo false	2018-08-10 22:03:36 -07:00
Gian Merlino	3525d4059e	Cache: Add maxEntrySize config, make groupBy cacheable by default. (#5108 ) * Cache: Add maxEntrySize config. The idea is this makes it more feasible to cache query types that can potentially generate large result sets, like groupBy and select, without fear of writing too much to the cache per query. Includes a refactor of cache population code in CachingQueryRunner and CachingClusteredClient, such that they now use the same CachePopulator interface with two implementations: one for foreground and one for background. The main reason for splitting the foreground / background impls is that the foreground impl can have a more effective implementation of maxEntrySize. It can stop retaining subvalues for the cache early. * Add CachePopulatorStats. * Fix whitespace. * Fix docs. * Fix various tests. * Add tests. * Fix tests. * Better tests * Remove conflict markers. * Fix licenses.	2018-08-07 10:23:15 -07:00
Clint Wylie	62677212cc	Order rows during incremental index persist when rollup is disabled. (#6107 ) * order using IncrementalIndexRowComparator at persist time when rollup is disabled, allowing increased effectiveness of dimension compression, resolves #6066 * fix stuff from review	2018-08-06 14:17:48 -07:00
Nishant Bangarwa	75c8a87ce1	Part 2 of changes for SQL Compatible Null Handling (#5958 ) * Part 2 of changes for SQL Compatible Null Handling * Review comments - break lines longer than 120 characters * review comments * review comments * fix license * fix test failure * fix CalciteQueryTest failure * Null Handling - Review comments * review comments * review comments * fix checkstyle * fix checkstyle * remove unrelated change * fix test failure * fix failing test * fix travis failures * Make StringLast and StringFirst aggregators nullable and fix travis failures	2018-08-02 08:20:25 -07:00
Jonathan Wei	b9c445c780	Optimize filtered aggs with interval filters in per-segment queries (#5857 ) * Optimize per-segment queries * Always optimize, add unit test * PR comments * Only run IntervalDimFilter optimization on __time column * PR comments * Checkstyle fix * Add test for non __time column	2018-08-01 14:39:38 -07:00
Andrés Gómez	e270362767	Add stringLast and stringFirst aggregators extension (#5789 ) * Add lastString and firstString aggregators extension * Remove duplicated class * Move first-last-string doc page to extensions-contrib * Fix ObjectStrategy compare method * Fix doc bad aggregatos type name * Create FoldingAggregatorFactory classes to fix SegmentMetadataQuery * Add getMaxStringBytes() method to support JSON serialization * Fix null pointer exception at segment creation phase when the string value is null * Control the valueSelector object class on BufferAggregators * Perform all improvements * Add java doc on SerializablePairLongStringSerde * Refactor ObjectStraty compare method * Remove unused ; * Add aggregateCombiner unit tests. Rename BufferAggregators unit tests * Remove unused imports * Add license header * Add class name to java doc class serde * Throw exception if value is unsupported class type * Move first-last-string extension into druid core * Update druid core docs * Fix null pointer exception when pair->string is null * Add null control unit tests * Remove unused imports * Add first/last string folding aggregator on AggregatorsModule to support segment metadata query * Change SerializablePairLongString to extend SerializablePair * Change vars from public to private * Convert vars to primitive type * Clarify compare comment * Change IllegalStateException to ISE * Remove TODO comments * Control possible null pointer exception * Add @Nullable annotation * Remove empty line * Remove unused parameter type * Improve AggregatorCombiner javadocs * Add filterNullValues option at StringLast and StringFirst aggregators * Add filterNullValues option at agg documentation * Fix checkstyle * Update header license * Fix StringFirstAggregatorFactory.VALUE_COMPARATOR * Fix StringFirstAggregatorCombiner * Fix if condition at StringFirstAggregateCombiner * Remove filterNullValues from string first/last aggregators * Add isReset flag in FirstAggregatorCombiner * Change Arrays.asList to Collections.singletonList	2018-08-01 10:52:54 -07:00
Roman Leventov	0754d78a2e	Prohibit Lists.newArrayList() with a single argument (#6068 ) * Prohibit Lists.newArrayList() with a single argument * Test fixes * Add Javadoc to Node constructor	2018-07-31 20:09:10 -07:00
Clint Wylie	20ae8aa626	Fix 'auto' encoded longs + compression serializer (#6045 ) * Fix 'auto' encoded longs + compression serializer Fixes #6044 changes: * Fixes `VSizeLongSerde` serializers to treat 'close' as 'flush' when used with `BlockLayoutColumnarLongsSerializer`, allowing unwritten values to be flushed to the buffer when the block is compressed * Add exhaustive unit test that flexes a variety of value sizes, row counts, and compression strategies to catch issues such as these : * refactor LongSerializer close to be named flush instead * revert and just make new serializers per block	2018-07-30 18:35:20 -07:00
Roman Leventov	f3595c93d9	Fix a bug in GroupByQueryEngine (#6062 )	2018-07-30 14:39:38 -07:00
Gian Merlino	c57f4a5db0	FinalizingFieldAccessPostAggregator: Fix serde. (#6067 ) Fixes #6063.	2018-07-28 08:44:22 -07:00
Benedict Jin	331a0afb98	Remove redundant type parameters and enforce some other style and inspection rules (#5980 ) * Various changes about druid-services module * Patch improvements from reviewer * Add ToArrayCallWithZeroLengthArrayArgument & ArraysAsListWithZeroOrOneArgument into inspection profile * Fix ArraysAsListWithZeroOrOneArgument * Fix conflict * Fix ToArrayCallWithZeroLengthArrayArgument * Fix AliEqualsAvoidNull * Remove blank line * Remove unused import clauses * Fix code style in TopNQueryRunnerTest * Fix conflict * Don't use Collections.singletonList when converting the type of array type * Add argLine into maven-surefire-plugin in druid-process module & increase the timeout value for testMoveSegment testcase * Roll back the latest commit * Add java.io.File#toURL() into druid-forbidden-apis * Using Boolean.parseBoolean instead of Boolean.valueOf for CliCoordinator#isOverlord * Add a new regexp element into stylecode xml file * Fix style error for new regexp * Set the level of ArraysAsListWithZeroOrOneArgument as WARNING * Fix style error for new regexp * Add option BY_LEVEL for ToArrayCallWithZeroLengthArrayArgument in inspection profile * Roll back the level as ToArrayCallWithZeroLengthArrayArgument as ERROR * Add toArray(new Object[0]) regexp into checkstyle config file & fix them * Set the level of ArraysAsListWithZeroOrOneArgument as ERROR & Roll back the level of ToArrayCallWithZeroLengthArrayArgument as WARNING until Youtrack fix it * Add a comment for string equals regexp in checkstyle config * Fix code format * Add RedundantTypeArguments as ERROR level inspection * Fix cannot resolve symbol datasource	2018-07-27 16:56:49 -05:00
kaijianding	7919e4d5df	move rangeSet compare into shardspec (#5688 )	2018-07-26 14:17:57 -07:00
Gian Merlino	04ea3c9f8c	Update license headers. (#5976 ) * Update license headers. For compliance with http://www.apache.org/legal/src-headers.html. * More license adjustments. * Fix mistakenly edited package line.	2018-07-11 09:55:18 -07:00
Gian Merlino	948e73da77	Extend various test timeouts. (#5978 ) False failures on Travis due to spurious timeout (in turn due to noisy neighbors) is a bigger problem than legitimate failures taking too long to time out. So it makes sense to extend timeouts.	2018-07-10 13:02:14 -07:00
Benedict Jin	b3021ec802	Fix bug in SegmentAnalyzer.analyzeComplexColumn() #5939 (#5954 )	2018-07-09 15:36:16 -07:00
Surekha	441c9819d9	Support limit for timeseries query (#5894 ) (#5931 ) * Support limit for timeseries query (#5894) * Fix tests * Address PR comments * Try to fix teamcity inspection checks * Remove unused method from VirtualColumns * Remove unused import statement	2018-07-09 08:58:42 -07:00
Jihoon Son	10a01d6846	[SQL] Fix missing postAggregations for Timeseries and TopN (#5912 ) * [SQL] Fix missing postAggregations for Timeseries and TopN * fix build * fix test	2018-06-29 10:36:55 -07:00
Jonathan Wei	f3e1520360	Fix merge for TrueDimFilter (#5916 ) * Fix merge for TrueDimFilter * remove unused cache ID	2018-06-28 14:46:47 -07:00
scrawfor	bf2a31a5bc	Add new 'true' filter which always returns true. (#5711 ) * Add new 'true' filter which always returns true. * Add support for bitmap index. * Adds documentation. * Removes No-op Filter	2018-06-28 11:52:45 -07:00
zhangxinyu	d857345b7d	add method getRequiredColumns for DimFilter (#5872 ) * add method getRequiredColumns for DimFilter * deal with the NullPointerException when DimFilter is null	2018-06-27 15:45:46 -07:00
陈春斌	7649742943	Use ReentrantReadWriteLock in DimensionDictionary (#5883 )	2018-06-25 12:35:26 -07:00
Nishant Bangarwa	1c031784cb	Align long Aggregator implementation with Double and Float (#5861 ) Add LongMin/Max aggregator combiners Extract common code from LongSum/Min/MaxAggregatorFactories in SimpleLongAggregatorFactory	2018-06-14 01:56:41 +04:00
Gian Merlino	3af95913a9	Lazy-ify IncrementalIndex filtering too. (#5852 ) * Lazy-ify IncrementalIndex filtering too. Follow-up to #5403, which only lazy-ified cursor-based filtering on QueryableIndex. * Fix logic error.	2018-06-06 18:03:34 -07:00
Gian Merlino	78fd27cdb2	Lazy-ify ValueMatcher BitSet optimization for string dimensions. (#5403 ) * Lazy-ify ValueMatcher BitSet optimization for string dimensions. The idea is that if the prior evaluated filters are decently selective, such that they mean we won't see all possible values of the later filters, then the eager version of the optimization is too wasteful. This involves checking an extra bitset, but the overhead is small even if the lazy-ification is useless. * Remove import. * Minor transformation	2018-06-05 09:06:51 -07:00
Clint Wylie	2b45a6a42d	Fix topN lexicographic sort (#5815 ) * fixes #5814 changes: * pass `StorageAdapter` to topn algorithms to get things like if column is 'sorted' or if query interval is smaller than segment granularity, instead of using `io.druid.segment.Capabilities` * remove `io.druid.segment.Capabilities` since it had one purpose, supplying `dimensionValuesSorted` which is now provided directly by `StorageAdapter`. * added test for topn optimization path checking * add Capabilities back since StorageAdapter is marked PublicApi * oops * add javadoc, fix build i think * correctly revert api changes * fix intellij fail * fix typo :(	2018-05-31 09:53:29 -07:00
Jihoon Son	9dca5ec76b	Simple cleanup for ThreadPoolTaskRunner and SetAndVerifyContextQueryRunner / Add ThreadPoolTaskRunnerTest (#5557 ) * Simple fix for ThreadPoolTaskRunner * fix build * address comments * update javadoc * fix build * fix test * add dependency	2018-05-15 22:53:11 +05:30
Alexander Saydakov	15864434be	ArrayOfDoublesSketch module (#5148 ) * ArrayOfDoublesSketch module * UTF-8 fix * javadoc, style fixes * more style fixes * null key selector fix * more style fixes * removed @Override, strict compiler doesn't like it * removed @Override, strict compiler doesn't like it * IndexedInts is not autoclosable? removed one more @0verride * synchronized with upstream master * removed unused imports * addressed review points * null fix * addressed review points * IAE from druid package * synchronized aggregate() and get() * use locks per buffer position * corrected javadoc * style fixes * added lock and narrowed the scope * addressed review comments * conflict resolution went wrong * addressed review comments * javadoc * javadoc links * fully qualified name since there is no import for this class * addressed review points * style fix * StandardCharsets.UTF_8 * addressed review points * added @Override * added equals and hashCode tests for post aggs * formatting * suppress warnings * optimal IndexedInts iteration * suppress SelfEquals * added comments about getClass() in equals()	2018-05-13 15:48:00 +03:00
Dylan Wylie	e8caf02147	Revert "Use a bimap for reverse lookups on injective maps" (#5764 ) * Revert "Consider waiting and pending compaction tasks as well as running tasks in DruidCoordinatorSegmentCompactor (#5704)" This reverts commit `c7a59394e0`. * Revert "Fix metrics for inserting segments (#5749)" This reverts commit `c9d645103b`. * Revert "Typo fix in historical doc (#5753)" This reverts commit `aa23fe6386`. * Revert "Use a bimap for reverse lookups on injective maps (#5681)" This reverts commit `e1277d306c`.	2018-05-09 19:12:36 -07:00
Surekha	2f8904e25f	Check against the real default of maxBytes(1/6 max mem) in AppenderatorImpl's add (#5758 ) * The check for maxBytesInMemory should be >= 0 instead of > 0 * if the default value is 0, the actual check could be skipped * fix the message for persistReasons * Address PR comments * if maxBytes set -1, make is Long.MAX_VAL, so we do not need to check if it's 0 or -1 * set the maxBytesTuningconfig in AppenderatorImpl constructor to avoid duplicate code * fix the failing test cases * Address PR comments	2018-05-09 13:41:51 -07:00
Dylan Wylie	e1277d306c	Use a bimap for reverse lookups on injective maps (#5681 ) * Use a bimap for reverse lookups on injective maps - A BiMap provides constant-time lookups for mapping values to keys * Address comments * Fix Tests	2018-05-07 18:46:21 -07:00
Fokko Driesprong	a95ec92296	Move to the org.lz4 dependency (#5746 ) The net.jpountz.lz4 moved to org.lz4	2018-05-07 08:16:45 -07:00
kaijianding	c12c16385e	support throw duplcate row during realtime ingestion in RealtimePlumber (#5693 )	2018-05-04 10:12:25 -07:00
Surekha	13c616ba24	'maxBytesInMemory' tuningConfig introduced for ingestion tasks (#5583 ) * This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks Currently a config called 'maxRowsInMemory' is present which affects how much memory gets used for indexing.If this value is not optimal for your JVM heap size, it could lead to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might be bad for query performance and a higher value will limit number of persists but require more jvm heap space and could lead to OOM. 'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes kept in memory before persisting. * The default value is 1/3(Runtime.maxMemory()) * To maintain the current behaviour set 'maxBytesInMemory' to -1 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them will be respected i.e. the first one to go above threshold will trigger persist * Fix check style and remove a comment * Add overlord unsecured paths to coordinator when using combined service (#5579) * Add overlord unsecured paths to coordinator when using combined service * PR comment * More error reporting and stats for ingestion tasks (#5418) * Add more indexing task status and error reporting * PR comments, add support in AppenderatorDriverRealtimeIndexTask * Use TaskReport instead of metrics/context * Fix tests * Use TaskReport uploads * Refactor fire department metrics retrieval * Refactor input row serde in hadoop task * Refactor hadoop task loader names * Truncate error message in TaskStatus, add errorMsg to task report * PR comments * Allow getDomain to return disjointed intervals (#5570) * Allow getDomain to return disjointed intervals * Indentation issues * Adding feature thetaSketchConstant to do some set operation in PostAgg (#5551) * Adding feature thetaSketchConstant to do some set operation in PostAggregator * Updated review comments for PR #5551 - Adding thetaSketchConstant * Fixed CI build issue * Updated review comments 2 for PR #5551 - Adding thetaSketchConstant * Fix taskDuration docs for KafkaIndexingService (#5572) * With incremental handoff the changed line is no longer true. * Add doc for automatic pendingSegments (#5565) * Add missing doc for automatic pendingSegments * address comments * Fix indexTask to respect forceExtendableShardSpecs (#5509) * Fix indexTask to respect forceExtendableShardSpecs * add comments * Deprecate spark2 profile in pom.xml (#5581) Deprecated due to https://github.com/druid-io/druid/pull/5382 * CompressionUtils: Add support for decompressing xz, bz2, zip. (#5586) Also switch various firehoses to the new method. Fixes #5585. * This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks Currently a config called 'maxRowsInMemory' is present which affects how much memory gets used for indexing.If this value is not optimal for your JVM heap size, it could lead to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might be bad for query performance and a higher value will limit number of persists but require more jvm heap space and could lead to OOM. 'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes kept in memory before persisting. * The default value is 1/3(Runtime.maxMemory()) * To maintain the current behaviour set 'maxBytesInMemory' to -1 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them will be respected i.e. the first one to go above threshold will trigger persist * Address code review comments * Fix the coding style according to druid conventions * Add more javadocs * Rename some variables/methods * Other minor issues * Address more code review comments * Some refactoring to put defaults in IndexTaskUtils * Added check for maxBytesInMemory in AppenderatorImpl * Decrement bytes in abandonSegment * Test unit test for multiple sinks in single appenderator * Fix some merge conflicts after rebase * Fix some style checks * Merge conflicts * Fix failing tests Add back check for 0 maxBytesInMemory in OnHeapIncrementalIndex * Address PR comments * Put defaults for maxRows and maxBytes in TuningConfig * Change/add javadocs * Refactoring and renaming some variables/methods * Fix TeamCity inspection warnings * Added maxBytesInMemory config to HadoopTuningConfig * Updated the docs and examples * Added maxBytesInMemory config in docs * Removed references to maxRowsInMemory under tuningConfig in examples * Set maxBytesInMemory to 0 until used Set the maxBytesInMemory to 0 if user does not set it as part of tuningConfing and set to part of max jvm memory when ingestion task starts * Update toString in KafkaSupervisorTuningConfig * Use correct maxBytesInMemory value in AppenderatorImpl * Update DEFAULT_MAX_BYTES_IN_MEMORY to 1/6 max jvm memory Experimenting with various defaults, 1/3 jvm memory causes OOM * Update docs to correct maxBytesInMemory default value * Minor to rename and add comment * Add more details in docs * Address new PR comments * Address PR comments * Fix spelling typo	2018-05-03 16:25:58 -07:00
Jihoon Son	86746f82d8	Use mergeBuffer instead of processingBuffer in parallelCombiner (#5634 ) * Use mergeBuffer instead of processingBuffer in parallelCombiner * Fix test * address comments * fix test * Fix test * Update comment * address comments * fix build * Fix test failure	2018-04-27 18:14:37 -07:00
Roman Leventov	9be000758d	Refactor index merging, replace Rowboats with RowIterators and RowPointers (#5335 ) * Refactor index merging, replace Rowboats with RowIterators and RowPointers * Add javadocs * Fix a bug in QueryableIndexIndexableAdapter * Fixes * Remove unused declarations * Remove unused GenericColumn.isNull() method * Fix test * Address comments * Rearrange some code in MergingRowIterator for more clarity * Self-review * Fix style * Improve docs * Fix docs * Rename IndexMergerV9.writeDimValueAndSetupDimConversion to setUpDimConversion() * Update Javadocs * Minor fixes * Doc fixes, more code comments, cleanup of RowCombiningTimeAndDimsIterator * Fix doc link	2018-04-27 17:34:32 -07:00
Slim Bouguerra	73da7426da	Timeseries results are incoherent for case interval is out of range and case false filter. (#5649 ) * adding some tests Change-Id: I92180498e2e6695212b286d980e349c136c78c86 * added empty sequence runner Change-Id: I20c83095072bbf3b4a3a57dfc1934d528e2c7a1a * treat only granularity ALL Change-Id: I1d88fab500c615bc46db4f4497ce93089976441f * moving toList within If and add expected queries Change-Id: I56cdd980e44f0685806efb45e29031fa2e328ec4 * typo Change-Id: I42fdd28da5471f6ae57d3962f671741b106300cd * adding tests and fix logic of intervals Change-Id: I0bd414d2278e3eddc2810e4f5080e6cf6a117f12 * fix style Change-Id: I99a2380934c9ab350ca934c56041dc343c08b99f * comments review Change-Id: I726a3b905a9520d8b1db70e4ba17853c65c414a4	2018-04-23 15:55:18 -07:00
Roman Leventov	a3a9ada843	Add GenericWhitespace checkstyle check (#5668 )	2018-04-24 01:09:14 +05:30
scrawfor	15f4ab2b31	Expose noop filter to users (#5597 )	2018-04-18 07:57:07 -07:00
Gian Merlino	5d09f76df6	topN: Fix caching of Float dimension values. (#5653 ) Jackson would deserialize them as Doubles, leading to ClassCastExceptions in the topN processing pipeline when it attempted to treat them as Floats.	2018-04-17 15:35:18 -07:00
Gian Merlino	fbf3fc178e	Timeseries: Add "grandTotal" option. (#5640 ) * Timeseries: Add "grandTotal" option. * Modify whitespace. * Checkstyle workaround.	2018-04-16 18:22:19 -07:00
Jihoon Son	f349e03091	Fix NPE in compactionTask (#5613 ) * Fix NPE in compactionTask * more annotations for metadata * better error message for empty input * fix build * revert some null checks * address comments	2018-04-13 00:11:03 -04:00
Gian Merlino	72d6dcda4f	ParallelCombiner: Fix buffer leak on exception in "combine". (#5630 ) Once a buffer is acquired, we need to make sure to release it if an exception is thrown before the closeable iterator is created.	2018-04-11 20:39:39 -04:00
Senthil Kumar L S	371c672828	Adding feature thetaSketchConstant to do some set operation in PostAgg (#5551 ) * Adding feature thetaSketchConstant to do some set operation in PostAggregator * Updated review comments for PR #5551 - Adding thetaSketchConstant * Fixed CI build issue * Updated review comments 2 for PR #5551 - Adding thetaSketchConstant	2018-04-05 22:56:59 -07:00
Niketh Sabbineni	270fd1ea15	Allow getDomain to return disjointed intervals (#5570 ) * Allow getDomain to return disjointed intervals * Indentation issues	2018-04-05 22:12:30 -07:00
Jonathan Wei	969342cd28	More error reporting and stats for ingestion tasks (#5418 ) * Add more indexing task status and error reporting * PR comments, add support in AppenderatorDriverRealtimeIndexTask * Use TaskReport instead of metrics/context * Fix tests * Use TaskReport uploads * Refactor fire department metrics retrieval * Refactor input row serde in hadoop task * Refactor hadoop task loader names * Truncate error message in TaskStatus, add errorMsg to task report * PR comments	2018-04-05 21:38:57 -07:00
Kirill Kozlov	8878a7ff94	Replace guava Charsets with native java StandardCharsets (#5545 )	2018-03-28 21:00:08 -07:00
Niketh Sabbineni	912adcc284	ArrayAggregation: Use long to avoid overflow (#5544 ) * ArrayAggregation: Use long to avoid overflow * Add Tests	2018-03-28 16:37:53 -07:00
Jihoon Son	024e0a9cca	Respect forceHashAggregation in queryContext (#5533 ) * Respect forceHashAggregation in queryContext * address comment	2018-03-28 14:15:38 -07:00
Atul Mohan	ec17a44e09	Add result level caching to Brokers (#5028 ) * Add result level caching to Brokers * Minor doc changes * Simplify sequences * Move etag execution * Modify cacheLimit criteria * Fix incorrect etag computation * Fix docs * Add separate query runner for result level caching * Update docs * Add post aggregated results to result level cache * Fix indents * Check byte size for exceeding cache limit * Fix indents * Fix indents * Add flag for result caching * Remove logs * Make cache object generation synchronous * Avoid saving intermediate cache results to list * Fix changes that handle etag based response * Release bytestream after use * Address PR comments * Discard resultcache stream after use * Fix docs * Address comments * Add comment about fluent workflow issue	2018-03-23 19:11:52 -07:00
Jihoon Son	1ad898bde2	Use the official aws-sdk instead of jet3t (#5382 ) * Use the official aws-sdk instead of jet3t * fix compile and serde tests * address comments and fix test * add http version string * remove redundant dependencies, fix potential NPE, and fix test * resolve TODOs * fix build * downgrade jackson version to 2.6.7 * fix test * resolve the last TODO * support proxy and endpoint configurations * fix build * remove debugging log * downgrade hadoop version to 2.8.3 * fix tests * remove unused log * fix it test * revert KerberosAuthenticator change * change hadoop-aws scope to provided in hdfs-storage * address comments * address comments	2018-03-21 15:36:54 -07:00
Clint Wylie	885b975c95	fix LongsColumnWithNulls and FloatsColumnWithNulls to override isNull in order to actually use nullValueBitmap (#5510 )	2018-03-20 16:04:08 -07:00
Charles Allen	58f110f7f8	Future-proof some Guava usage (#5414 ) * Future-proof some Guava usage * Use a java-util EmptyIterator instead of Guava's * Change some of the guava future handling to do manual async transforms. Guava changes transform into transformAsync by deprecating transform in ONLY Guava 19. Then its gone in 20 * Use `Collections.emptyIterator()` * Pretty formatting * Make listenable future transforms a thing in default druid * Format fix * Add forbidden guava apis * Make the ListenableFutrues.transformAsync have comments * Undo intellij bad pattern matching in comments * Futrues --> Futures * Add empty iterators forbidding * Fix extra `A` * Correct method signature * Address review comments * Finish Gian review comments * Proper syntax from https://github.com/policeman-tools/forbidden-apis/wiki/SignaturesSyntax	2018-03-20 08:59:33 -07:00
Slim	17c71a2a60	Make Doubles aggregators use 64bits by default (#5478 ) * use 64-bit float representation for double based aggregator Change-Id: Ia4f442037052add178f6ac68138c9d52f96c6e09 * review comments Change-Id: I5a588f7364f236bf22f2b138e9d743bfb27c67fe	2018-03-19 19:13:04 -07:00
Roman Leventov	693e3575f9	Remove unused code and exception declarations (#5461 ) * Remove unused code and exception declarations * Address comments * Remove redundant Exception declarations * Make FirehoseFactoryV2.connect() to throw IOException again	2018-03-16 22:11:12 +01:00
Samarth Jain	afa25202a3	Segment filtering should be done by looking at the inner most query o… (#5496 ) * Segment filtering should be done by looking at the inner most query of a nested query * Fixing checkstyle errors * Addressing code review comments	2018-03-16 14:05:14 -07:00
Gian Merlino	a08efe4683	Fix round robining in router. (#5500 ) * Fix round robining in router. Say that ten times fast. For query endpoints, AsyncQueryForwardingServlet called hostFinder.getDefaultServer() to set a default server, followed by hostFinder.getServer(inputQuery) to override it with query-specific routing. Since hostFinder is round-robin, this skips a server. When there are only two servers, one server is _always_ skipped and the router sends all queries to the same broker. * Adjust spacing.	2018-03-15 18:45:59 -07:00
Gian Merlino	16b81fcd53	SegmentMetadataQuery: Fix default interval handling. (#5489 ) * SegmentMetadataQuery: Fix default interval handling. PR #4131 introduced a new copy builder for segmentMetadata that did not retain the value of usingDefaultInterval. This led to it being dropped and the default-interval handling not working as expected. Instead of using the default 1 week history when intervals are not provided, the segmentMetadata query would query _all_ segments, incurring an unexpected performance hit. This patch fixes the bug and adds a test for the copy builder. * Intervals	2018-03-15 10:05:46 -07:00
Niketh Sabbineni	40cc2c8740	Query should not fail because emitter fails or throws Exception (#5484 )	2018-03-13 19:57:05 -07:00
Roman Leventov	6b158abe3f	Enforce optimal IndexedInts iteration (#5456 ) * Enforce optimal IndexedInts iteration * Fix remaining suboptimal usages	2018-03-09 09:42:40 -08:00
Niraja Mishra	ba3dbf2a42	Fixed NPE when dimension is null or empty. https://github.com/druid-io/druid/issues/3007 (#5299 )	2018-03-05 16:27:35 -08:00
Gian Merlino	7416d1d02d	Add "joda" option to timeFormat extractionFn. (#5448 )	2018-03-02 19:59:26 -08:00
Jonathan Wei	cf5f74b013	Fix GroupBy limit push down descending sorting on numeric columns (#5453 )	2018-03-01 18:43:45 -08:00
Gian Merlino	e4eaee3806	Support for disabling bitmap indexes. (#5402 ) * Support for disabling bitmap indexes. Can save space for columns where bitmap indexes are pointless (like free-form text). * Remove import. * Fix CompactionTaskTest. * Update for review comments. * Review comments, tests. * Fix test.	2018-02-28 19:19:56 -08:00
Niraja Mishra	0f009a41e1	Fixed PeriodGranularity for Asia pacific timezones (#5410 )	2018-02-27 10:39:50 -08:00
Nishant Bangarwa	219e77aeac	SQL compatible Null Handling Part - Expressions and Storage Changes (#5278 ) * SQL compatible Null Handling Part - Expressions, Storage and Dimension Selector Changes fix travis strict compilation * fix teamcity error - remove unused method * review comments * review comments * more comments * review comments * review comments * Optimize isNull method * Optimize isNull in ColumnarFloats/Longs/Doubles * review comment - separate classes for null and non-null columns fix intellij inspection * remove unused import * More Review comments * improve comment * More review comments * fix checkstyle * more review comments * review comments. fix javadoc links remove Nullable from ConstantColumnValueSelector * review comments. * satisfy teamcity inspections	2018-02-21 13:27:26 +01:00
Jihoon Son	deeda0dff2	Fix DefaultLimitSpec to respect sortByDimsFirst (#5385 ) * Fix DefaultLimitSpec to respect sortByDimsFirst * fix bug * address comment	2018-02-16 15:26:32 -08:00
Roman Leventov	e64ffb10c2	Standartize on using Integer.BYTES instead of Ints.BYTES from Guava, same for other primitives (#5366 )	2018-02-07 13:24:30 -08:00
Gian Merlino	971d45ab3f	Use a separate snapshot file per lookup tier. (#5358 ) Prevents conflicts if two processes on the same machine use the same lookup snapshot directory but are in different tiers.	2018-02-07 11:28:53 -08:00
Gian Merlino	e255d66b85	Fix two improper casts in HavingSpecMetricComparator. (#5352 ) * Fix two improper casts in HavingSpecMetricComparator. Fixes two things: 1. An improper double-to-long cast when comparing double metrics to any kind of value, which was a regression from #4883. 2. An improper double-to-long cast when comparing a long/int metric to a double/float value: the value was cast to long/int, drawing strange conclusions like int 100 matching a havingSpec of equalTo(100.5). * Add comments. * Remove extraneous comment. * Simplify code a bit.	2018-02-06 13:18:55 -08:00
Gian Merlino	c21ff6e81c	Properly set "identity" in query metrics. (#5330 ) * Properly set "identity" in query metrics. This patch adds an "identity" field to QueryPlus and sets it in QueryLifecycle when the query starts executing. This is important because it allows it to be used for future QueryMetrics created by that QueryPlus object. We also add "identity" to the request-level QueryMetrics object created in emitLogsAndMetrics. * Remove unused method.	2018-02-06 10:53:00 -08:00
Gian Merlino	8c738c7076	Fix races in LookupSnapshotTaker, CoordinatorPollingBasicAuthenticatorCacheManager (#5344 ) * Fix races in LookupSnapshotTaker, CoordinatorPollingBasicAuthenticatorCacheManager. Both were susceptible to the following conditions: 1. Two JVMs on the same machine (perhaps two peons) could conflict by one reading while the other was writing, or by writing to the file at the same time. 2. One JVM could partially write a file, then crash, leaving a truncated file. * Use StringUtils.format	2018-02-06 09:44:06 -08:00
Slim	37c09ce3f8	Use both Joad Ids and Java IDs as Timezone to string readers (#5349 ) * Use both Joad Ids and Java IDs as Timezone to string readers Change-Id: Ieb5c18559879f3f3a0104912ce2f0a354ad0aac3 * move the function to DateTimes and add org.joda.time.DateTimeZone#forID as part of forbidden api Change-Id: Iff97fa044758019ed0c231587d10e31a9cc18da0 * exclude class and remove other usage Change-Id: Ib458c2caaa1865535767e1009fbf017a92c8f615 * remove it from test classes Change-Id: I9b576324f6c7e17a74bd8b13879232c9a8cd40b4 * remove unused Change-Id: If1c5b70c26c2b7c83c20434cb72b2060653f5052	2018-02-06 16:34:11 +05:30
Gian Merlino	9a62b02cb7	Extensions: Option to load classes from extension jars first. (#5321 ) The behavior is configurable through druid.extensions.useExtensionClassloaderFirst. It is useful when extensions want to load a dependency different from one provided by Druid, for example a different version of geoip or protobuf.	2018-02-06 16:14:03 +05:30
Jonathan Wei	285dedd126	More ParseException handling for numeric dimensions (#5312 ) * Discard rows with unparseable numeric dimensions * PR comments * Don't throw away entire row on parse exception * PR comments * Fix import	2018-02-05 21:43:35 -08:00
Gian Merlino	7e02408510	Update versions to 0.13.0-SNAPSHOT. (#5323 )	2018-02-02 12:06:38 -06:00
Himanshu	4cd47de62f	add LookupExtractorFactory.destroy() method (#5287 ) * add LookupExtractorFactory.destroy() method * fix LookupReferencesManagerTest	2018-02-01 22:56:09 -08:00
Gian Merlino	ed47a1e1a9	Lookups: Inherit "injective" from registered lookups, improve docs. (#5316 ) Code changes: - In the lookup-based extractionFns, inherit injective property from the lookup itself if not specified. Doc changes: - Add a "Query execution" section to the lookups doc explaining how injective lookups and their optimizations work. - Remove scary warnings against using registeredLookup extractionFns. They are necessary and important since they work with filters and function cascades -- two things that the dimension specs do not do. They deserve to be first class citizens. - Move the "registeredLookup" fn above the "lookup" fn. It's probably more commonly used, so the docs read better this way.	2018-02-01 18:30:19 -08:00
Jonathan Wei	80419752b5	Add metamx emitter, http clients, and metrics packages to druid java-util (#5289 ) * Add metamx java-util emitter, http clients, and metrics packages to druid java-util * Remove metamx java-util from pom.xml files * Checkstyle fixes * Import fix * TeamCity inspection fixes * Use slf4j, move some version defs to master pom.xml * Use parent jvm-attach-api and maven-surefire-plugin versions * Add ] to log msg, suppress inspection	2018-01-24 22:10:36 +01:00
Roman Leventov	61e6878afd	Check Javadoc reference integrity (#5279 )	2018-01-22 13:51:28 -08:00
Roman Leventov	a346bbc6f3	Enforce spacing around foreach colon with Checkstyle (#5271 )	2018-01-22 11:48:51 -08:00
Roman Leventov	f99c27e9e0	Fix bugs in ImmutableRTree; Merge bytebuffer-collections module into druid-processing (#5275 ) * Fix bugs in ImmutableRTree; optimize ImmmutableRTreeObjectStrategy.writeTo(); Merge bytebuffer-collections module into druid-processing * Remove unused declaration * Fix another bug	2018-01-23 00:49:59 +05:30
Roman Leventov	87c744ac1d	Add MethodParamPad, OneStatementPerLine and EmptyStatement Checkstyle checks (#5272 )	2018-01-18 11:29:23 -08:00
Roman Leventov	ad6cdf5d09	Reuse IndexedInts returned from DimensionSelector.getRow() implementations (#5172 ) * Reuse IndexedInts in DimensionSelector implementations * Remove BaseObjectColumnValueSelector.getObject() doc * typo	2018-01-17 16:01:26 +01:00
Clint Wylie	491f8cca81	fix timewarp query results when using timezones and crossing DST transitions (#5157 ) * timewarp and timezones changes: * `TimewarpOperator` will now compensate for daylight savings time shifts between date translation ranges for queries using a `PeriodGranularity` with a timezone defined * introduces a new abstract query type `TimeBucketedQuery` for all queries which have a `Granularity` (100% not attached to this name). `GroupByQuery`, `SearchQuery`, `SelectQuery`, `TimeseriesQuery`, and `TopNQuery` all extend `TimeBucke tedQuery`, cutting down on some duplicate code and providing a mechanism for `TimewarpOperator` (and anything else) that needs to be aware of granularity * move precondition check to TimeBucketedQuery, add Granularities.nullToAll, add getTimezone to TimeBucketQuery * formatting * more formatting * unused import * changes: * add 'getGranularity' and 'getTimezone' to 'Query' interface * merge 'TimeBucketedQuery' into 'BaseQuery' * fixup tests from resulting serialization changes * dedupe * fix after merge * suppress warning	2018-01-11 12:39:33 -08:00
Roman Leventov	8877ce38d6	Enforce modifier order with Checkstyle (#5246 )	2018-01-11 09:50:42 +01:00
Roman Leventov	535ec437e9	Apply 'power of 2' optimization to BlockLayoutIndexedDoubleSupplier (#5176 ) * Apply 'power of 2' optimization to BlockLayoutIndexedDoubleSupplier; slight optimization of buffer.get() in block layout indexed suppliers * Fix byte order	2018-01-05 16:08:07 +09:00
Jonathan Wei	935ac646f4	Upgrade to Calcite 1.15.0 (#5210 ) * Upgrade to Calcite 1.15.0 * Use Filtration.eternity()	2018-01-04 12:11:24 -08:00
Roman Leventov	579f9fbedf	Add IndexedInts.debugToString() and AbstractIndex.toString(); Add Sequence.toList() and limit() (#5175 ) * Add IndexedInts.debugToString() and AbstractIndex.toString() * Fix AppenderatorTest	2018-01-04 09:56:47 +09:00
Roman Leventov	dc87e4fda1	Renamed IndexedFloats/Doubles/Longs to ColumnarFloats/..., IndexedMultivalue to ColumnarMultiInts, separate IndexedInts from ColumnarInts, many other renames for consistency in io.druid.segment.data package (#5171 )	2017-12-20 18:50:07 -08:00
Clint Wylie	1181411901	small optimization in timeseries if 'skipEmptyBuckets' is true and cursor completed (#5178 )	2017-12-19 16:47:00 -06:00
Roman Leventov	f18eba50ee	Remove Aggregator.reset() (#5177 )	2017-12-19 14:09:17 -08:00
Roman Leventov	5787d04fad	Bump Druid version to 0.12.0 (#5138 )	2017-12-15 07:37:01 -08:00
Roman Leventov	64848c7ebf	DataSegment memory optimizations (#5094 ) * Deduplicate DataSegments contents (loadSpec's keys, dimensions and metrics lists as a whole) more aggressively; use ArrayMap instead of default LinkedHashMap for DataSegment.loadSpec, because they have only 3 entries on average; prune DataSegment.loadSpec on brokers * Fix DataSegmentTest * Refinements * Try to fix * Fix the second DataSegmentTest * Nullability * Fix tests * Fix tests, unify to use TestHelper.getJsonMapper() * Revert TestUtil as ServerTestHelper, fix tests * Add newline * Fix indexing tests * Fix s3 tests * Try to fix tests, remove lazy caching of ObjectMapper in TestHelper, rename TestHelper.getJsonMapper() to makeJsonMapper() * Fix HDFS tests * Fix HdfsDataSegmentPusherTest * Capitalize constant names	2017-12-12 11:41:40 -08:00
Charles Allen	4365390310	Remove duplicate fastutil dependency in processing pom.xml (#5142 )	2017-12-07 08:54:48 +09:00
Alexander Saydakov	45f91a241e	numeric quantiles sketch aggregator (#5002 ) * numeric quantiles sketch aggregator * it seems that we need to synchronize all methods, which modify the state * Seems like a false positive with -Pstrict * code style fix * code style fix * use sketches-core-0.10.3 * moved cache ids to the central place * better class names * support large columns * explained autodetection, added exception * added comments regarding sketches moving on heap * support reindexing * implemented suggestions from jihoonson * style fix * use max(k, other.k) for better accuracy * check for NilColumnValueSelector instead of null * throw exceptions instead of providing no-op comparators	2017-12-06 08:18:08 +09:00
Roman Leventov	a7a6a0487e	Replace IOPeon with SegmentWriteOutMedium; Improve buffer compression (#4762 ) * Replace IOPeon with OutputMedium; Improve compression * Fix test * Cleanup CompressionStrategy * Javadocs * Add OutputBytesTest * Address comments * Random access in OutputBytes and GenericIndexedWriter * Fix bugs * Fixes * Test OutputBytes.readFully() * Address comments * Rename OutputMedium to SegmentWriteOutMedium and OutputBytes to WriteOutBytes * Add comments to ByteBufferInputStream * Remove unused declarations	2017-12-04 18:04:27 -08:00
Parag Jain	7c01f77b04	Parse Batch support (#5081 ) * add parseBatch and deprecate parse method in InputRowParser add addAll method, skip max rows in memory check for it remove parse method from implemetations transform transformers add string multiplier input row parser fix withParseSpec fix kafka batch indexing fix isPersistRequired comments * add unit test * make persist async * review comments	2017-12-04 16:06:16 -06:00
Roman Leventov	aacc57131b	Don't use HistoricalXxxPrototypes in PooledTopNAlgorithm when Cursor's offset is FilteredOffset (#5133 ) * Don't use HistoricalXxxPrototypes in PooledTopNAlgorithm when Cursor's offset is FilteredOffset, because it doens't support cloning * Add test	2017-12-01 17:04:22 -08:00
Gian Merlino	5f6bdd940b	SQL: Improve translation of time floor expressions. (#5107 ) * SQL: Improve translation of time floor expressions. The main change is to TimeFloorOperatorConversion.applyTimestampFloor. - Prefer timestamp_floor expressions to timeFormat extractionFns, to avoid turning things into strings when it isn't necessary. - Collapse CAST(FLOOR(X TO Y) AS DATE) to FLOOR(X TO Y) if appropriate. * Fix tests.	2017-11-29 12:06:03 -08:00
Chuanlei Ni	368d03146b	assign granularity.all to SelectQuery by default (#5091 )	2017-11-21 17:10:19 -08:00
zhangxinyu1	590633c595	fix bug in method reset of ByteBufferHashTable.java (#5100 )	2017-11-17 13:19:16 -03:00
Jonathan Wei	ec6774039e	Fix numLookupLoadingThreads default value (#5097 )	2017-11-16 15:13:52 -08:00
Gian Merlino	77df5e0673	ExpressionSelectors: Add optimized selectors. (#5048 ) * ExpressionSelectors: Add caching selectors. - SingleLongInputCaching selector for expressions on the __time column, using a similar optimization to SingleScanTimeDimSelector - SingleStringInputDimensionSelector for expressions on string columns that return strings, using a similar optimization to ExtractionFn based DimensionSelectors. - SingleStringInputCaching selector for expressions on string columns that return primitives. Also, in the SQL planner, prefer expressions for time operations rather than extractionFns. * Code review comments.	2017-11-13 20:24:24 -08:00
Akash Dwivedi	c1538f29fc	maxQueryTimeout property in runtime properties. (#4852 ) * maxQueryTimeout property in runtime properties. * extra line * move withTimeoutAndMaxScatterGatherBytes method to QueryLifeCycle. * Fix initialize method. * remove unused import. * doc update. * some more details in doc about query failure.. * minor fix. * decorating QueryRunner to set and verify context. Added by servers. * remove whitespace.	2017-11-13 19:23:11 -06:00
Gian Merlino	9444da5038	SQL: Improved behavior when implicitly casting strings to date/time literals. (#5023 ) * SQL: Improved behavior when implicitly casting strings to date/time literals. - Handle all flavors of ISO8601 and SQL literals. - Throw errors on other literals instead of silently transforming them to 0. * Respect timeZone when format is null.	2017-11-10 17:43:22 +09:00
Roman Leventov	3541b7544b	Prohibit and remove unused declarations in the processing module (#4930 ) * Prohibit and remove unused declarations in the processing module * Fix tests * Fix integration tests * Suppress unused * Try to remove SuppressWarnings unused in VirtualColumn * Remove reset 'false positives' * Annotate CliCommandCreator as ExtensionPoint * Unused import warning instead of error in IntelliJ * Fixes * Add comment * Fix AzureBlob * Fix CloudFilesBlob * Address comments * Add Project SDK section to INTELLIJ_SETUP.md * Fix image	2017-11-09 09:27:27 -08:00
Roman Leventov	a8dc056c09	Add retries for coordinator fetch and lookup start in LookupReferencesManager (#5029 ) * Add retries for coordinator fetch and lookup start in LookupReferencesManager * Fix LookupConfigTest * Address comments * Address more comments * And address more comments * Address comms * Recognize 'not found' lookups in LookupReferencesManager.tryGetLookupListFromCoordinator(), by @egor-ryashin	2017-11-09 02:30:36 -03:00
Jihoon Son	5f3c863d5e	Add compaction task (#4985 ) * Add compaction task * added doc * use combining aggregators * address comments * add support for dimensionsSpec * fix getUniqueDims and getUniqueMetics * find unique dimensionsSpec * fix compilation * add unit test * fix test * fix test * test for different dimension orderings and types, and doc for type and ordering * add control for custom ordering and type * update doc * fix compile * fix compile * add segments param * fix serde error * fix build	2017-11-03 21:55:27 -06:00
Roman Leventov	5eb08c27cb	Add Emitter monitoring (#4973 ) * Add Emitter monitoring * Fix typo * Fixes * testing new emitter * Fix failed test (#71) * testing new emitter * fix on failed test * Remove emitter's readTimeout from docs * Update docs * Add HttpEmittingMonitor * Update java-util to 1.3.2	2017-11-03 21:27:57 -06:00
Gian Merlino	6c725a7e06	Fix havingSpec on complex aggregators. (#5024 ) * Fix havingSpec on complex aggregators. - Uses the technique from #4883 on DimFilterHavingSpec too. - Also uses Transformers from #4890, necessitating a move of that and other related classes from druid-server to druid-processing. They probably make more sense there anyway. - Adds a SQL query test. Fixes #4957. * Remove unused import.	2017-11-01 12:58:08 -04:00
Gian Merlino	1df458b35e	Fix improper handling of empty arrays in StringDimensionIndexer. (#5012 ) * Fix improper handling of empty arrays in StringDimensionIndexer. This bug was able to introduce data errors: if the input rows to an IncrementalIndex contained entirely empty arrays and single values, then upon persisting to disk, the empty arrays would be replaced with the lexicographically smallest single value, rather than nulls like they should have been. * Style fix. * Add tests for bitmap indexes too.	2017-10-27 14:21:48 -07:00
Jihoon Son	d7024f22e1	Upgrade fastutil to 8.1.0 (#4988 ) * Upgrade failutil to 8.1.0 * unused import	2017-10-19 23:37:43 -05:00
Jihoon Son	52d7f74226	Add streaming aggregation as the last step of ConcurrentGrouper if data are spilled (#4704 ) * Add steaming grouper * Fix doc * Use a single dictionary while combining * Revert GroupByBenchmark * Removed unused code * More cleanup * Remove unused config * Fix some typos and bugs * Refactor Groupers.mergeIterators() * Add comments for combining tree * Refactor buildCombineTree * Refactor iterator * Add ParallelCombiner * Add ParallelCombinerTest * Handle InterruptedException * use AbstractPrioritizedCallable * Address comments * [maven-release-plugin] prepare release druid-0.11.0-sg * [maven-release-plugin] prepare for next development iteration * Address comments * Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit `5c6b31e488`. * Revert "[maven-release-plugin] prepare release druid-0.11.0-sg" This reverts commit `0f5c3a8b82`. * Fix build failure * Change list to array * rename sortableIds * Address comments * change to foreach loop * Fix comment * Revert keyEquals() * Remove loop * Address comments * Fix build fail * Address comments * Remove unused imports * Fix method name * Split intermediate and leaf combine degrees * Add comments to StreamingMergeSortedGrouper * Add more comments and fix overflow * Address comments * ConcurrentGrouperTest cleanup * add thread number configuration for parallel combining * improve doc * address comments * fix build	2017-10-17 23:24:08 -07:00
Slim	af2bc5f814	Make float default representation for DoubleSum/Min/Max aggregators (#4944 ) * Introduce System wide property to select how to store double. Set the default to store as float Change-Id: Id85cca04ed0e7ecbce78624168c586dcc2adafaa * fix tests Change-Id: Ib42db724b8a8f032d204b58c366caaeabdd0d939 * Change the property name Change-Id: I3ed69f79fc56e3735bc8f3a097f52a9f932b4734 * add tests and make default distribution store doubles as 64bits Change-Id: I237b07829117ac61e247a6124423b03992f550f2 * adding mvn argument to parallel-test profile Change-Id: Iae5d1328f901c4876b133894fa37e0d9a4162b05 * move property name and helper function to io.druid.segment.column.Column Change-Id: I62ea903d332515de2b7ca45c02587a1b015cb065 * fix docs and clean style Change-Id: I726abb8f52d25dc9dc62ad98814c5feda5e4d065 * fix docs Change-Id: If10f4cf1e51a58285a301af4107ea17fe5e09b6d	2017-10-16 17:17:22 -07:00
Himanshu	a7e802c9d4	greater-than/less-than/equal-to havingSpec to call AggregatorFactory.finalizeComputation(..) (#4883 ) * greater-than/less-than/equal-to havingSpec to call AggregatorFactory.finalizeComputation(..) * fix the unit test and expect having to work on hyperUnique agg * test fix * fix style errors	2017-10-16 12:02:30 -07:00
Roman Leventov	dc7cb117a1	Refactor ColumnSelectorFactory; Rely on ColumnValueSelector's polymorphism (#4886 ) * Refactor ColumnSelectorFactory; Rely on ColumnValueSelector's polymorphism * Fix MapVirtualColumn.makeColumnValueSelector() * Minor fixes * Fix IndexGeneratorCombinerTest * DimensionSelector to return zeros when treated as numeric ColumnValueSelector * Fix IncrementalIndexTest * Fix IncrementalIndex.makeColumnSelectorFactory() * Optimize MapBasedRow.getMetric() * Fix VarianceAggregatorTest * Simplify IncrementalIndex.makeColumnSelectorFactory() * Address comments * More comments * Test	2017-10-13 21:44:17 -05:00
Jihoon Son	8d9902831e	Refactoring PrefetchableTextFilesFirehoseFactory (#4836 ) * Refactoring prefetchable firehose * Fix to read cache when prefetch is disabled * More tests * Cleanup codes * Add Fetcher * Fix test failure * Count file size * Fix test * rename generic parameter * address comments * address comments * reuse buffer * move Execs to java-util * use execs * Fix build	2017-10-13 21:39:28 -05:00
Jihoon Son	675c6c00dd	Add checkstyle and intellij rule to prohibit unnecessary qualifiers in interfaces (#4958 ) * add checkstyle and intellij rule * fix tc fail	2017-10-13 07:56:19 -07:00
Atul Mohan	c07678b143	Synchronization of lookups during startup of druid processes (#4758 ) * Changes for lookup synchronization * Refactor of Lookup classes * Minor refactors and doc update * Change coordinator instance to be retrieved by DruidLeaderClient * Wait before thread shutdown * Make disablelookups flag true by default * Update docs * Rename flag * Move executorservice shutdown to finally block * Update LookupConfig * Refactoring and doc changes * Remove lookup config constructor * Revert Lookupconfig constructor changes * Add tests to LookupConfig * Make executorservice local * Update LRM * Move ListeningScheduledExecutorService to ExecutorCompletionService * Move exception to outer block * Remove check to see future is done * Remove unnecessary assignment * Add logging	2017-10-12 21:22:24 -05:00
Gian Merlino	32f36beaae	QueryableIndexStorageAdapter: Lift column cache to Cursor sequence. (#4950 ) * QueryableIndexStorageAdapter: Lift column cache to Cursor sequence. This is where it was before #4710, when its was moved to the individual Cursors, leading to higher than expected memory usage. It could be extreme for finer query granularities like "second". * Comment.	2017-10-12 16:44:33 -05:00
Jihoon Son	56fb11ce0b	Lazy initialization for JavaScript functions (#4871 ) * Lazy initialization of JavaScript functions * Fix test failure * Fix thread-safety and postpone js conf check * Fix test fail * Fix test * Fix KafkaIndexTaskTest * Move config check	2017-10-10 21:52:42 -07:00
Jonathan Wei	18635a19b3	Remove unused limitFn in GroupByQuery (#4935 ) * Remove unused limitFn in GroupByQuery * Remove unused limitFn creation logic	2017-10-10 15:56:30 -07:00
Gian Merlino	b20e3038b6	SQL: Upgrade to Calcite 1.14.0, some refactoring of internals. (#4889 ) * SQL: Upgrade to Calcite 1.14.0, some refactoring of internals. This brings benefits: - Ability to do GROUP BY and ORDER BY with ordinals. - Ability to support IN filters beyond 19 elements (fixes #4203). Some refactoring of druid-sql internals: - Builtin aggregators and operators are implemented as SqlAggregators and SqlOperatorConversions rather being special cases. This simplifies the Expressions and GroupByRules code, which were becoming complex. - SqlAggregator implementations are no longer responsible for filtering. Added new functions: - Expressions: strpos. - SQL: TRUNCATE, TRUNC, LENGTH, CHAR_LENGTH, STRLEN, STRPOS, SUBSTR, and DATE_TRUNC. * Add missing @Override annotation. * Adjustments for forbidden APIs. * Adjustments for forbidden APIs. * Disable GROUP BY alias. * Doc reword.	2017-10-10 12:44:05 -07:00
chunghochen	0614b92df1	adding new post aggregators for test statistics to druid-stats extension (#4532 ) * adding new post aggregators of test stats to druid-stats extension * changes to address code review comments * fix checkstyle violations using druid_intellij_formatting.xml after merge upstream/master * add @Override annotation per CI log * make changes per review comments/discussions * remove some blocks per review comments	2017-10-09 23:43:27 -07:00
Akash Dwivedi	716a5ec1a8	Add identity to DefaultSearchQueryMetrics and DefaultSelectQueryMetrics. (#4906 )	2017-10-04 20:28:23 -05:00
Akash Dwivedi	2ee32399ff	granularity method in QueryMetrics. (#4570 ) * granularity method in QueryMetrics. PR to emit granularity dimension for timeseries, search, groupBy, select and topN queries. * QueryMetricsFactory classes for search and select queries. * Empty implementation for Granularity() method. * Review comment changes. * Remove unused import. * empty query() method. * checkstyle fix. * Import fix.	2017-10-04 09:42:52 -07:00
Jonathan Wei	5fbec5b435	Fix limit push down comparator bug (#4868 )	2017-10-02 11:44:23 -07:00
Gian Merlino	1f2074c247	Bump versions in master to 0.11.1-SNAPSHOT. (#4878 ) * Bump versions in master to 0.11.1-SNAPSHOT. * Missed a few.	2017-09-28 17:09:51 -05:00
Gian Merlino	a19f22b5bb	Add identity to query metrics, logs. (#4862 ) * Add identity to query metrics, logs. Also fix a bug where unauthorized requests would not emit any logs or metrics, and instead would log a "Tried to emit logs and metrics twice" warning. Also rename QueryResource's "getServer" to "cancelQuery", because that's what it does. * Do not emit identity by default.	2017-09-28 11:45:23 -07:00
Goh Wei Xiang	2c30d5ba55	Add org.joda.time.DateTime.parse() to forbidden APIs (#4857 ) * Added org.joda.time.DateTime#(java.lang.String) to forbidden API. * Added org.joda.time.DateTime#(java.lang.String, org.joda.time.format.DateTimeFormatter) to forbidden API. * Add additional APIs that may create DateTime with default time zone * Add helper function that accepts formatter to parse String. * Add additional forbidden APIs * Replace existing usage of forbidden APIs * Use wrapper class to enforce Chronology on DateTimeFormatter. * Creates constant UtcFormatter for constant ISODateTimeFormat.	2017-09-27 17:46:44 -05:00
Roman Leventov	9c126e2aa9	Forbid MapMaker (#4845 ) * Forbid MapMaker * Shorter syntax * Forbid Maps.newConcurrentMap()	2017-09-27 06:49:47 -07:00
Roman Leventov	e267f3901b	Enforce Indentation with Checkstyle (#4799 )	2017-09-21 13:06:48 -07:00
Roman Leventov	a9d8539802	Remove IndexedInts.iterator() (#4811 ) * Remove IndexedInts.iterator() * Retain IndexedInts.iterator(), but don't extend Iterable * Add BitmapValues * Fix tests	2017-09-20 21:25:52 -07:00
Roman Leventov	88e9a80636	Rename ObjectValueSelector.get() to getObject(); Add getObject() and classOfObject() to ColumnValueSelector (#4801 )	2017-09-19 14:47:20 -05:00
Roman Leventov	24646ac76a	LZ4 decompression forward compatibility (#4824 )	2017-09-19 10:18:37 -07:00
Charles Allen	00d39ce7a5	Move checks for bitmap size == 0 to isEmpty (#4820 )	2017-09-19 21:45:16 +05:30
Jonathan Wei	c2a0e753b6	Extension points for authentication/authorization (#4271 ) * Extension points for authentication/authorization * Address some PR comments * Authorization result caching * Add unit tests for SecuritySanityCheckFilter and PreResponseAuthorizationCheckFilter * Use Set for auth caching, close outputstreams in filters * Don't close output stream on success in sanity check filter * Add ConfigResourceFilter to coordinator lookups * Fix filtering authorization check for empty resource list * HttpClient users must explicitly escalate the client * Remove response modification from PreResponseAuthorizationCheckFilter * Remove extraneous pom.xml * Fix unit test * Better lifecycle management * Rename AuthorizationManager to Authorizer * Fix authorization denials for empty supervisor list * Address some PR comments * Address more PR comments * Small cleanup * Add Jetty HttpClient wrapper to Authenticator * Remove Authorizer start/stop * Restore immutable context map in DruidConnection, UT fix * Fix/update docs * Add authorization checks to EventReceiverFirehose * Fix router authorization check failure, restore PreResponseAuthorizationFilter changes * Compile fixes * Test fixes * Update Authenticator/Authorizer doc comments * Merge fixes * PR comments * Fix test * Fix IT * More PR comments * PR comments * SSL fix	2017-09-15 23:45:48 -07:00
Roman Leventov	3f92184dd8	Inspection fixes (#4809 )	2017-09-15 17:48:29 -07:00
Roman Leventov	b61248fdb1	Replace HistoricalFloatColumnSelector with more generic HistoricalColumnSelector (#4796 )	2017-09-14 13:52:06 -07:00
Akash Dwivedi	a17e48fe69	search package name correction. (#4785 ) * search package name correction. * Refactor search.search pkg to search. * remove unused import.	2017-09-14 13:50:23 -07:00
Gian Merlino	2ce8123bdb	Move scan-query from a contrib extension into core. (#4751 ) * Move scan-query from a contrib extension into core. Based on a proposal at: https://groups.google.com/d/topic/druid-development/ME_OatUDnbk/discussion This patch also adds support for virtual columns to the Scan query, and updates Druid SQL to use Scan instead of Select. This patch also makes some behavioral changes to handling of the __time column. In particular, it is now is returned as "__time" rather than "timestamp"; it is no longer included if you do not specifically ask for it in your "columns"; and it is returned as a long rather than a string. Users can revert time handling to the legacy extension behavior by setting "legacy" : true in their queries, or setting the property druid.query.scan.legacy = true. This is meant to provide a migration path for users that were formerly using the contrib extension. * Adjustments from review. * Add back Select query. * Adjust SQL docs. * Restore SelectQuery link.	2017-09-13 09:51:24 -07:00
Gian Merlino	c3a1ce6933	SQL: Fix toTimeseriesQuery and toTopNQuery. (#4780 ) The former would sometimes eat limits, and the latter would sometimes use the wrong dimension comparator.	2017-09-12 14:37:27 -07:00
Jonathan Wei	3a29521273	Fix GroupBy limit push down error when buffer is too small (#4745 ) * Fix GroupBy limit push down error when buffer is too small * Address PR comments	2017-09-12 12:34:50 -07:00
Roman Leventov	832cc293ef	Refactoring of ReferenceCountingSegment and FireHydrant (#4154 ) * Refactoring of ReferenceCountingSegment and FireHydrant * Address comment * Fix FireHydrant.closeSegment() * Address comment * Added comments to ReferenceCountingSegment	2017-09-12 14:28:35 -05:00
Gian Merlino	4909c48b0c	SQL: Full TRIM support. (#4750 ) * SQL: Full TRIM support. - Support trimming arbitrary characters - Support BOTH, LEADING, and TRAILING * Remove unused import. * Fix tests, add RTRIM / LTRIM. * Remove unused imports. * BTRIM and docs. * Replace for with foreach.	2017-09-12 11:49:08 -07:00
Gian Merlino	23c0357816	BufferHashGrouperTest: Better behavior with regard to large buffers. (#4779 ) * BufferHashGrouperTest: Better behavior with regard to large buffers. 1) Free buffers after each test 2) Avoid mmaping past the end of a file * Use CloserRule.	2017-09-11 12:10:31 -07:00
Andy Sloane	706747cc8c	Fix for sort order in select/topN query cache (#4766 ) When historical caching is enabled, and a select or topN query is issued, and then a following query with "descending": true is set, the cached query returns the ascending result (or vice versa), often resulting in invalid paging identifiers. The CacheKey for these queries doesn't include the "descending" flag; this change adds it, and fixes the problem.	2017-09-10 11:33:00 +09:00
Charles Allen	bdfc6fe25e	Move common TypeReference into JacksonUtils (#4738 )	2017-08-31 13:40:16 -07:00
Niketh Sabbineni	beecb9e210	Fix failing build, remove unused import (#4726 ) LGTM	2017-08-29 14:46:38 +09:00
Roman Leventov	4d109a358a	Refactoring of Storage Adapters (#4710 ) * Factor QueryableIndexColumnSelectorFactory and IncrementalIndexColumnSelectorFactory out of QueryableIndexStorageAdapter and IncrementalIndexStorageAdapter; Add Offset.getBaseReadableOffset(); Remove OffsetHolder interface; Replace Cursor extends ColumnSelectorFactory with composition; Reduce indirection in ColumnValueSelectors created by QueryableIndexColumnSelectorFactory * Don't override clone() in FilteredOffset (the prev. implementation was broken); Some warnings fixed * Simplify Cursors in QueryableIndexStorageAdapter * Address comments * Remove unused and unimplemented methods from GenericColumn interface * Comments	2017-08-28 18:07:31 -07:00
Gian Merlino	daf3c5f927	Add "round" option to cardinality and hyperUnique aggregators. (#4720 ) * Add "round" option to cardinality and hyperUnique aggregators. Also turn it on by default in SQL, to make math on distinct counts work more as expected. * Fix some compile errors. * Fix test. * Formatting.	2017-08-28 14:52:11 -07:00
Gian Merlino	9fbfc1be32	Add @ExtensionPoint and @PublicApi annotations. (#4433 ) * Add @ExtensionPoint and @PublicApi annotations. * Clean up wording. * Remove unused import. * Remove unused imports. * Only types can be extension points. * Adjust annotations some more. * Remove unused import. * Make ServletFilterHolder an extension point. * Add a couple extension points, and update docs.	2017-08-28 14:50:58 -07:00
Gian Merlino	43488df975	Fix dimension selectors with extractionFns on missing columns. (#4717 ) * Fix dimension selectors with extractionFns on missing columns. This patch properly applies the requested extractionFn to missing columns. It's important when the extractionFn maps null to something other than null. * Extract helper method. * Change contracts of VirtualColumns and VirtualColumn methods based on review comments. * Remove unused import. * Remove unused method. * Adjust helper function. * Adjustments	2017-08-25 18:34:42 -05:00
Roman Leventov	598cc46bae	Replace HashMap with Obj2IntMap in StringDimensionIndexer; Small optimization in StringDimensionMergerV9 (#4721 )	2017-08-25 12:30:39 -07:00
Roman Leventov	326a85a9a4	Add Offset.reset() and remove unused Offset implementations (#4706 ) * Add Offset.reset() and remove unused Offset implementations * Fix BitmapOffset * Address comments	2017-08-22 17:43:29 -07:00
Roman Leventov	cacf63b007	Add AggregateCombiners (#4676 ) * Add MetricCombiners * Rename MetricCombiner to AggregateCombiner * Spelling * Fix TimestampAggregatorFactory.combine() and add makeAggregateCombiner() implementation * Rename AggregateCombiner.combine() to fold()	2017-08-21 16:45:29 -07:00
Roman Leventov	cbd1902db8	Add forbidden-apis plugin; prohibit using system time zone (#4611 ) * Forbidden APIs WIP * Remove some tests * Restore io.druid.math.expr.Function * Integration tests fix * Add comments * Fix in SimpleWorkerProvisioningStrategy * Formatting * Replace String.format() with StringUtils.format() in RemoteTaskRunnerTest * Address comments * Fix GroupByMultiSegmentTest	2017-08-21 13:02:42 -07:00
Roman Leventov	fa87eaa6e8	Remove IndexedInts.fill() (#4705 )	2017-08-21 13:01:34 -07:00
Asif Mansoor Amanullah	37f85b08d2	move row up/down for null metric ordering (#4681 ) * move row up/down for null metric ordering * addressed comments * addressed changes	2017-08-17 11:36:19 -05:00
Jonathan Wei	ab28dc3b97	free() dictionary merging buffers in IndexMerger (#4684 ) * free() dictionary merging buffers in IndexMerger * Use close() for dictionary merge iterators * Add comments on buffer free	2017-08-15 10:11:29 -07:00
Jonathan Wei	e91d4d1b80	Remove makeObjectColumnSelector() from DimensionIndexer (#4679 )	2017-08-11 14:39:00 -07:00
Jonathan Wei	1bddfc089c	Additional docs/log for direct memory usage (#4631 ) * Additional docs/log for direct memory usage * Tweak docs * Doc rewording	2017-08-10 23:33:20 -07:00
Roman Leventov	bf28d0775b	Remove QueryRunner.run(Query, responseContext) and related legacy methods (#4482 ) * Remove QueryRunner.run(Query, responseContext) and related legacy methods * Remove local var	2017-08-11 09:12:38 +09:00
Jihoon Son	65c1d6c797	Add IntGrouper to avoid unnecessary boxing/unboxing in array-based aggregation (#4668 ) * Add IntGrouper * Fix build * Address comments * Add a benchmark query	2017-08-10 07:41:39 -07:00
solimant	de9ba97d54	Move equals() from Float[Sum\|Min\|Max]AggregatorFactory to SimpleFloat... (#4675 ) Addresses #4671	2017-08-10 07:22:22 -07:00
Gian Merlino	7c89e12ca9	Replace Guava Enum.getIfPresent with builtin version. (#4659 ) * Replace Guava Enum.getIfPresent with builtin version. This is useful for running in Hadoop environments that use Guava 11. Some code is also simplified. * Code review	2017-08-09 17:20:00 -07:00
Jihoon Son	fe3421032b	Parallel sort for ConcurrentGrouper (#4660 ) * Multi-thread sort * Address comments	2017-08-09 16:24:36 -07:00
Goh Wei Xiang	42569e65e2	Minor fix in ExpressionSelectors to avoid potential NPE. (#4669 )	2017-08-09 10:13:31 -07:00
Roman Leventov	7454fd86a0	Polymorphic numeric getters for ColumnValueSelector (#4623 ) * Add methods getFloat(), getDouble() and getLong() to ColumnValueSelector * Fix copy-paste mistake in docs * Spelling	2017-08-08 18:38:06 -07:00
Roman Leventov	f5d4171459	Prohibit for loops which could be foreach with IntelliJ (#4653 ) * Replace for with foreach * Replace for with for-each in GroupByQueryEngineV2 * Remove io.druid.collections.IntList	2017-08-08 18:05:33 -07:00
Roman Leventov	aa7e4ae5e4	Enforce correct spacing with Checkstyle (#4651 )	2017-08-05 10:18:25 -07:00
Jonathan Wei	aa8d75004c	More informative QueryInterruptedException toString() (#4642 ) * More informative QueryInterruptedException toString() * Use StringUtils.format	2017-08-04 16:00:20 -07:00
Jonathan Wei	9650d80f3e	Don't use limit push down with having spec (#4630 ) * Don't use limit push down with having spec * Throw exception when forcing limit push down with having * Tests for having and limit push down * Fix pool sizes in unit test	2017-08-04 15:13:29 -07:00
Roman Leventov	7a005088d9	Add HistoricalCursor.getReadableOffset() to access unwrapped offset in selectors (#4633 ) * Add HistoricalCursor.getReadableOffset() to access unwrapped offset in selectors, when the 'main' offset if FilteredOffset (fixes #4628) * Stack overflow test	2017-08-04 12:51:48 -07:00
Jihoon Son	f3f2cd35e1	Array-based aggregation for groupBy query (#4576 ) * Array-based aggregation * Fix handling missing grouping key * Handle invalid offset * Fix compilation * Add cardinality check * Fix cardinality check * Address comments * Address comments * Address comments * Address comments * Cleanup GroupByQueryEngineV2.process * Change to Byte.SIZE * Add flatMap	2017-08-03 20:04:54 +03:00
Niketh Sabbineni	da43f68e95	NPE thrown when empty/null is passes to TimeDimExtractionFn (#4601 ) * NPE thrown when empty/null is passes to TimeDimExtractionFn * Add @Nullable where ever applicable * Add @Nullable to SearchQuerySpec.apply() * Remove unused	2017-07-26 21:02:08 -05:00
Akash Dwivedi	c372d2ecc1	Default implementation for getDouble(). (#4595 ) * Default implementation for getDouble(). * use getFloat for default implementation. * addressed comment. * new line.	2017-07-25 19:06:27 -05:00
Gian Merlino	3d6f409fc8	Fix groupBy on double dimensions. (#4596 ) * Fix groupBy on double dimensions. * Fix tests. * Fix tests. * Fix Scan tests.	2017-07-24 23:18:06 -07:00
Gian Merlino	8a4185897e	Add filter tests for both floats and doubles. (#4597 )	2017-07-24 17:02:02 -07:00
Atul Mohan	4bd0f174ba	Changes for deduplication (#4581 )	2017-07-24 11:12:23 -05:00
Roman Leventov	7408a7c4ed	Refactor CachingClusteredClient.run() (#4489 ) * Refactor CachingClusteredClient * Comments * Refactoring * Readability fixes	2017-07-23 23:10:36 +09:00
Roman Leventov	c0beb78ffd	Enforce brace formatting with Checkstyle (#4564 )	2017-07-21 10:26:59 -05:00
Slim	71e7a4c054	Adding double colums supports (#4491 ) * add double columns support * Fix numbers and expected results in UTs * adding float aggregators * fix IT expected test results * fix comments * more fixes * fix comp * fix test * refactor double and float aggregator factories * fix * fix UTs * fix comments * clean unused code * fix more comments * undo unnecessary changes * fix null issue * refactor TopNColumnSelectorStrategyFactory * fix docs * refactor NumericTopNColumnSelectorStrategy * fix return * fix comments * handle the null case in DimesionIndexer * more null fixing * cosmetic changes	2017-07-20 10:14:14 +03:00
Roman Leventov	ae86323dbd	Remove unnecessary qualifier (#4565 )	2017-07-18 17:40:54 +09:00
Roman Leventov	60cdf94677	Add PMD and prohibit unnecessary fully qualified class names in code (#4350 ) * Add PMD and prohibit unnecessary fully qualified class names in code * Extra fixes * Remove extra unnecessary fully-qualified names * Remove qualifiers * Remove qualifier	2017-07-17 22:22:29 +09:00
Chris Gavin	960cb07ea6	Fix some unnecessary use of boxed types and incorrect format strings spotted by lgtm. (#4474 ) * Remove some unnecessary use of boxed types. * Fix some incorrect format strings. * Enable IDEA's MalformedFormatString inspection. * Add a Checkstyle check for finding uses of incorrect logging packages. * Fix some incorrect usages of the metamx logger. * Bypass incorrect logger Checkstyle check where using the correct logger is not simple. * Fix some more places where the wrong number of arguments are provided to format strings. * Suppress `MalformedFormatString` inspection on legacy logging test. * Use @SuppressWarnings rather than a noinspection suppression comment. * Fix some more incorrect format strings. * Suppress some more incorrect format string warnings where the incorrect string is intentional. * Log the aggregator when closing it fails. * Remove some unneeded log lines.	2017-07-13 12:15:32 -07:00
Roman Leventov	b2865b7c7b	Make possible to start Peon without DI loading of any querying-related stuff (#4516 ) * Make QueryRunnerFactoryConglomerate injection lazy in TaskToolbox/TaskToolboxFactory * Extract QueryablePeonModule and add druid.modules.excludeList config * Typo	2017-07-12 13:18:25 -05:00
Goh Wei Xiang	53e6b5cb9b	Removal of TopNResultMerger because it is vestigial. (#4520 )	2017-07-12 13:24:07 +03:00
Roman Leventov	ad76f7a1ab	Make Filter.getBitmapResult() abstract (#4481 )	2017-07-11 12:39:32 -07:00
Akash Dwivedi	a108d05f76	Use GenericIndexed v2 supported read() during deserializeColumn (#4463 )	2017-07-11 10:18:25 -05:00
Roman Leventov	d168a4271e	Use Double.NEGATIVE_INFINITY and Double.POSITIVE_INFINITY (#4496 ) * Use Double.NEGATIVE_INFINITY and Double.POSITIVE_INFINITY instead of Double.MIN_VALUE and Double.MAX_VALUE, same for Float * Replace usages in comments * Fix RTree * Remove commented code * Add tests	2017-07-07 09:10:13 -06:00
Roman Leventov	fc4fe24dd5	Incorrect use of Long.TYPE and Float.TYPE as return type of ObjectColumnSelector.classOfObject() (#4501 )	2017-07-05 08:54:06 -07:00
Jonathan Wei	97a79f4478	Fix GroupBy type cast when ChainedExecutionQueryRunner merges results (#4488 ) * Fix GroupBy type cast error when ChainedExecutionQueryRunner merges multiple runners * Move conversion step to separate method * Remove unnecessary comment * Use compute to update map	2017-06-30 17:33:03 -07:00
Roman Leventov	9ae457f7ad	Avoid using the default system Locale and printing to System.out in production code (#4409 ) * Avoid usages of Default system Locale and printing to System.out or System.err in production code * Fix Charset in DruidKerberosUtil * Remove redundant string format in GenericIndexed * Rename StringUtils.safeFormat() to unimportantSafeFormat(); add StringUtils.format() which fails as well as String.format() * Fix testSafeFormat() * More fixes of redundant StringUtils.format() inside ISE * Rename unimportantSafeFormat() to nonStrictFormat()	2017-06-29 14:06:19 -07:00
Roman Leventov	ae900a4934	Update versions to 0.11.0-SNAPSHOT (#4483 )	2017-06-28 17:05:58 -07:00
Roman Leventov	6173570425	Add ExtensionsConfig.excludeModules (#4438 ) * Add ExtensionsConfig.excludeModules * Add branch * Refactor Initialization.getFromExtensions() * excludeModules -> moduleExcludeList * Initialization.getFromExtensions() and getLoadedModules() should return Collection, not Set * Fix doc	2017-06-28 14:01:31 -07:00
Gian Merlino	4c33d0a00f	Add some new expression functions and macros. (#4442 ) * Add some new expression functions and macros. See misc/math-expr.md for the list of added functions, except for "like", which previously existed but was not documented. * Add easymock to datasketches tests. * Add easymock to distinctcount tests. * Add easymock to virtual-columns tests. * Code review comments. * Clean up code a bit. * Add easymock to scan-query tests. * Rework ExprMacros that have multiple impls. * Improve test coverage.	2017-06-28 10:15:58 -07:00
Roman Leventov	2fa4b10145	More fine-grained DI for management node types. Don't allocate processing resources on Router (#4429 ) * Remove DruidProcessingModule, QueryableModule and QueryRunnerFactoryModule from DI for coordinator, overlord, middle-manager. Add RouterDruidProcessing not to allocate processing resources on router * Fix examples * Fixes * Revert Peon configs and add comments * Remove qualifier	2017-06-27 22:58:01 -07:00
Roman Leventov	05d58689ad	Remove the ability to create segments in v8 format (#4420 ) * Remove ability to create segments in v8 format * Fix IndexGeneratorJobTest * Fix parameterized test name in IndexMergerTest * Remove extra legacy merging stuff * Remove legacy serializer builders * Remove ConciseBitmapIndexMergerTest and RoaringBitmapIndexMergerTest	2017-06-26 13:21:39 -07:00
Jihoon Son	3e60c9125d	Increase timeout of GroupByQueryMergeBufferTest and AppenderatorDriverTest (#4441 )	2017-06-22 06:09:52 -07:00
Jihoon Son	3a5c480405	Split IndexMergerTest and ImmutableConciseSetTest (#4427 )	2017-06-21 20:55:51 -07:00
Gian Merlino	34d2f9ebfe	Queries: Restore old prepareAggregations method. (#4432 ) For backwards compatibility, post-#4394.	2017-06-21 05:36:32 -07:00
Gian Merlino	679cf277c0	Add ExpressionFilter. (#4405 ) * Add ExpressionFilter. The expression filter expects a single argument, "expression", and matches rows where that expression is true. * Code review comments. * CR comment. * Fix logic. * Fix test. * Remove unused import.	2017-06-20 12:42:26 -07:00
Gian Merlino	22aad08a59	ExpressionPostAggregator: Automatically finalize inputs. (#4406 ) * ExpressionPostAggregator: Automatically finalize inputs. Raw HyperLogLogCollectors and such aren't very useful. When writing expressions like `x / y` users will expect `x` and `y` to be finalized. * Fix un-merge. * Code review comments. * Remove unnecessary ImmutableMap.copyOf.	2017-06-17 13:22:47 -07:00
Goh Wei Xiang	f68a0693f3	Allow use of non-threadsafe ObjectCachingColumnSelectorFactory (#4397 ) * Adding a flag to indicate when ObjectCachingColumnSelectorFactory need not be threadsafe. * - Use of computeIfAbsent over putIfAbsent - Replace Maps.newXXXMap() with normal instantiation - Documentations on when is thread-safe required. - Use Builders for On/OffheapIncrementalIndex * - Optimization on computeIfAbsent - Constant EMPTY DimensionsSpec - Improvement on IncrementalIndexSchema.Builder - Remove setting of default values - Use var args for metrics - Correction on On/OffheapIncrementalIndex Builders - Combine On/OffheapIncrementalIndex Builders * - Removing unused imports. * - Helper method for testing with IncrementalIndex.Builder * - Correction on javadoc. * Style fix	2017-06-16 16:04:19 -05:00
Gian Merlino	054cf8a183	Limit random access in compressed column tests. (#4414 ) * Limit random access in compressed column tests. Random access leads to lots of block decompressions for reading single elements, which is time prohibitive for the large column tests. For those tests, limit the number of randomly accessed elements to 1000. * Random -> ThreadLocalRandom	2017-06-15 14:48:06 -07:00
Jonathan Wei	7fe295009e	Faster ByteBufferMinMaxOffsetHeapTest (#4404 )	2017-06-15 13:14:29 -05:00
Gian Merlino	6edee7f434	Expressions work better with strings. (#4394 ) * Expressions work better with strings. - ExpressionObjectSelector able to read from string columns, and able to return strings. - ExpressionVirtualColumn able to offer string (and long for that matter) as its native type. - ExpressionPostAggregator able to return strings. - groupBy, topN: Allow post-aggregators to accept dimensions as inputs, making ExpressionPostAggregator more useful. - topN: Use DimExtractionTopNAlgorithm for STRING columns that do not have dictionaries, allowing it to work with STRING-type expression virtual columns. - Adjusts null handling to better match the rest of Druid: null and empty string treated the same; nulls implicitly treated as zeroes in numeric context. * Code review comments. * More code review. * Fix test. * Adjust annotations.	2017-06-14 14:50:18 -07:00
Roman Leventov	113b8007b7	Increase timeout for QueryGranularityTest.testDeadLock() (#4395 )	2017-06-12 13:28:21 -07:00
Gian Merlino	1f2afccdf8	Expressions: Add ExprMacros. (#4365 ) * Expressions: Add ExprMacros, which have the same syntax as functions, but can convert themselves to any kind of Expr at parse-time. ExprMacroTable is an extension point for adding new ExprMacros. Anything that might need to parse expressions needs an ExprMacroTable, which can be injected through Guice. * Address code review comments.	2017-06-08 09:32:10 -04:00
Jonathan Wei	9ae04b7375	Remove queryMetricsFactory from GroupByQueryConfig (#4383 )	2017-06-07 21:35:26 -05:00
Roman Leventov	63a897c278	Enable most IntelliJ 'Probable bugs' inspections (#4353 ) * Enable most IntelliJ 'Probable bugs' inspections * Fix in RemoteTestNG * Fix IndexSpec's equals() and hashCode() to include longEncoding * Fix inspection errors * Extract global isntance of natural().nullsFirst(); address comments * Fix * Use noinspection comments instead of SuppressWarnings on method for IntelliJ-specific inspections * Prohibit Ordering.natural().nullsFirst() using Checkstyle	2017-06-07 09:54:25 -07:00
Roman Leventov	b487fa355b	More methods in QueryMetrics and TopNQueryMetrics (the last part of #3798 ) (#4284 ) * Add more methods to QueryMetrics and TopNQueryMetrics, add BitmapResultFactory * Add implementor expectations section to BitmapResultFactory javadoc	2017-06-07 09:49:08 -07:00
kaijianding	551a89bd67	serialize DateTime As Long to improve json serde performance (#4038 )	2017-06-06 10:08:51 -07:00
Gian Merlino	d22db30db4	VirtualColumns: Block virtual columns with empty names. (#4367 ) * VirtualColumns: Block virtual columns with empty names. * Spelling.	2017-06-06 08:05:47 -07:00
Roman Leventov	31d33b333e	Make using implicit system Charset an error (#4326 ) * Make using implicit system charset an error * Use StringUtils.toUtf8() and fromUtf8() instead of String.getBytes() and new String() * Use English locale in StringUtils.safeFormat() * Restore comment	2017-06-05 23:57:25 -07:00
Jonathan Wei	b90c28e861	Support limit push down for GroupBy (#3873 ) * Support limit push down for GroupBy V2 * Use orderBy spec ordering when applying limit push down * PR Comments * Remove unused var * Checkstyle fixes * Fix test * Add comment on non-final variables, fix checkstyle * Address PR comments * PR comments * Remove unnecessary buffer reset * Fix missing @JsonProperty annotation	2017-06-02 15:39:04 -07:00
praveev	290ed3ab9d	Make DateTime timezone aware (#4343 ) * Make DateTime timezone aware * Change unit tests to make DateTime timezone aware for PeriodGranularity	2017-06-02 12:45:52 -07:00
kaijianding	0efd18247b	explicitly unmap hydrant files when abandonSegment to recycle mmap memory (#4341 ) * fix TestKafkaExtractionCluster fail due to port already used * explicitly unmap hydrant files when abandonSegment to recyle mmap memory * address the comments * apply to AppenderatorImpl	2017-06-01 18:15:30 -05:00
Roman Leventov	50e72c6aea	Fix bugs (core) (#4339 ) * Fix bugs * Add test for GoogleDataSegmentPusher.buildPath() * Exclude extension changes * Address comments * Brace	2017-06-02 06:47:59 +09:00
Roman Leventov	78179ef74d	Inject QueryMetrics factories via PolyBind (#4336 )	2017-05-31 09:07:03 -07:00
Kenji Noguchi	3400f601db	Protobuf extension (#4039 ) * move ProtoBufInputRowParser from processing module to protobuf extensions * Ported PR #3509 * add DynamicMessage * fix local test stuff that slipped in * add license header * removed redundant type name * removed commented code * fix code style * rename ProtoBuf -> Protobuf * pom.xml: shade protobuf classes, handle .desc resource file as binary file * clean up error messages * pick first message type from descriptor if not specified * fix protoMessageType null check. add test case * move protobuf-extension from contrib to core * document: add new configuration keys, and descriptions * update document. add examples * move protobuf-extension from contrib to core (2nd try) * touch * include protobuf extensions in the distribution * fix whitespace * include protobuf example in the distribution * example: create new pb obj everytime * document: use properly quoted json * fix whitespace * bump parent version to 0.10.1-SNAPSHOT * ignore Override check * touch	2017-05-30 13:11:58 -07:00
Kamal Gurala	dcb07d6958	Option to configure default analysis types in SegmentMetadataQuery (#4259 ) * Option to configure default analysis types * Updated Docs and renamed * Added serde tests and Null handling * Fixed Documentation * Updated implementation * Updated implementation * Updated implementation * Added usingDefaultIntervals in Builder * Updated implementation * Updated implementation and added failing test * filterSegments implementation updated * Updated imlementation * Padding * Add missing Override * Updated implementation * Fixed a naming bug * Fixed bug * Removed comment	2017-05-26 12:12:39 -07:00
Gian Merlino	1eaa7887bd	Fix integer overflow in BufferGrouper. (#4333 ) Would have led to out of bounds buffer access with large buffers. Also added tests using large buffers.	2017-05-25 23:30:20 -07:00
Gian Merlino	2bd4c0930f	Fix "quarter" granularity serialization. (#4316 )	2017-05-23 10:06:17 -07:00
Gian Merlino	9283807ad7	GroupByQuery: Fix type-spanning comparisons. (#4317 ) Jackson deserializes integers sometimes as int and sometimes as long, depending on how big they are. This leads to ClassCastException when comparing deserialized values as part of groupBy merging on the broker.	2017-05-23 10:06:04 -07:00
Gian Merlino	22e5f52d00	Workaround for non-thread-safe use of CardinalityAggregator. (#4304 )	2017-05-23 10:33:03 +09:00
Roman Leventov	8ec3a29af0	Don't pass QueryMetrics down in concurrent and async QueryRunners (fixes #4279 ) (#4288 ) * Don't pass QueryMetrics down in concurrent and async QueryRunners * Rename QueryPlus.threadSafe() to withoutThreadUnsafeState(); Update QueryPlus.withQueryMetrics() Javadocs; Fix generics in MetricsEmittingQueryRunner and CpuTimeMetricQueryRunner; Make DefaultQueryMetrics to fail fast on modifications from concurrent threads	2017-05-22 13:42:09 -05:00
Maksim Logvinenko	d45dad2b44	Remove boxing/unboxing in indexer (#4269 ) * Remove boxing/unboxing in indexer * Fix rowIndex visibility * Cleanup	2017-05-17 19:13:53 -05:00
Roman Leventov	d9f423f55d	Make QueryMetrics factories configurable (#4268 ) * Ensure QueryMetrics factories accept Json ObjectMapper; Make QueryMetrics factories configurable * Update QueryMetrics Javadocs * Add javadocs to QueryMetrics factories * Move queryMetricsFactory defaults to getter methods of config classes	2017-05-17 08:41:59 -07:00
Gian Merlino	ddc2e68998	Remove cache keys from HavingSpecs. (#4280 ) * Remove cache keys from HavingSpecs. They weren't used, since they aren't part of the groupBy cache key. Also, it's good that they weren't used, since many of them had value truncation bugs. * Fix imports. * Fix test.	2017-05-16 22:13:02 -07:00
Roman Leventov	d400f23791	Monomorphic processing of TopN queries with simple double aggregators over historical segments (part of #3798 ) (#4079 ) * Monomorphic processing of topN queries with simple double aggregators and historical segments * Add CalledFromHotLoop annocations to specialized methods in SimpleDoubleBufferAggregator * Fix a bug in Historical1SimpleDoubleAggPooledTopNScannerPrototype * Fix a bug in SpecializationService * In SpecializationService, emit maxSpecializations warning only once * Make GenericIndexed.theBuffer final * Address comments * Newline * Reapply `439c906` (Make GenericIndexed.theBuffer final) * Remove extra PooledTopNAlgorithm.capabilities field * Improve CachingIndexed.inspectRuntimeShape() * Fix CompressedVSizeIntsIndexedSupplier.inspectRuntimeShape() * Don't override inspectRuntimeShape() in subclasses of CompressedVSizeIndexedInts * Annotate methods in specializations of DimensionSelector and FloatColumnSelector with @CalledFromHotLoop * Make ValueMatcher to implement HotLoopCallee * Doc fix * Fix inspectRuntimeShape() impl in ExpressionSelectors * INFO logging of specialization events * Remove modificator * Fix OrFilter * Fix AndFilter * Refactor PooledTopNAlgorithm.scanAndAggregate() * Small refactoring * Add 'nothing to inspect' messages in empty HotLoopCallee.inspectRuntimeShape() implementations * Don't care about runtime shape in tests * Fix accessor bugs in Historical1SimpleDoubleAggPooledTopNScannerPrototype and HistoricalSingleValueDimSelector1SimpleDoubleAggPooledTopNScannerPrototype, cover them with tests * Doc wording * Address comments * Remove MagicAccessorBridge and ensure Offset subclasses are public * Attach error message to element	2017-05-16 16:19:55 -07:00
Roman Leventov	b7a52286e8	Make @Override annotation obligatory (#4274 ) * Make MissingOverride an error * Make travis stript to fail fast * Add missing Override annotations * Comment	2017-05-16 13:30:30 -05:00
Himanshu	136b2fae72	improve query timeout handling and limit max scatter-gather bytes (#4229 ) * improve query timeout handling and limit max scatter-gather bytes * address review comments	2017-05-16 12:47:32 -05:00
Benedict Jin	e823085866	Improve `collection` related things that reusing a immutable object instead of creating a new object (#4135 )	2017-05-17 01:38:51 +09:00
Jihoon Son	50a4ec2b0b	Add support for headers and skipping thereof for CSV and TSV (#4254 ) * initial commit * small fixes * fix bug * fix bug * address code review * more cr * more cr * more cr * fix * Skip head rows for CSV and TSV * Move checking skipHeadRows to FileIteratingFirehose * Remove checking null iterators * Remove unused imports * Address comments * Fix compilation error * Address comments * Add more tests * Add a comment to ReplayableFirehose * Addressing comments * Add docs and fix typos	2017-05-15 22:57:31 -07:00
Roman Leventov	1ebfa22955	Update Error prone configuration; Fix bugs (#4252 ) * Make Errorprone the default compiler * Address comments * Make Error Prone's ClassCanBeStatic rule a error * Preconditions allow only %s pattern * Fix DruidCoordinatorBalancerTester * Try to give the compiler more memory * Remove distribution module activation on jdk 1.8 because only jdk 1.8 is used now * Don't show compiler warnings * Try different travis script * Fix travis.yml * Make Error Prone optional again * For error-prone compiler * Increase compiler's maxmem * Don't run Error Prone for benchmarks because of OOM * Skip install step in Travis * Remove MetricHolder.writeToChannel() * In travis.yml, check compilation before tests, because it may fail faster	2017-05-12 15:55:17 +09:00
Roman Leventov	e09e892477	Refactor QueryRunner to accept QueryPlus: Query + QueryMetrics (part of #3798 ) (#4184 ) * Add QueryPlus. Add QueryRunner.run(QueryPlus, Map) method with default implementation, to replace QueryRunner.run(Query, Map). * Fix GroupByMergingQueryRunnerV2 * Fix QueryResourceTest * Expand the comment to Query.run(walker, context) * Remove legacy version of BySegmentSkippingQueryRunner.doRun() * Add LegacyApiQueryRunnerTest and be more specific about legacy API removal plans in Druid 0.11 in Javadocs	2017-05-10 12:25:00 -07:00
Himanshu	462f6482df	optionally add extensions to explicitly specified hadoopContainerClassPath (#4230 ) * optionally add extensions to explicitly specified hadoopContainerClassPath * note extensions always pushed in hadoop container when druid.extensions.hadoopContainerDruidClasspath is not provided explicitly	2017-05-08 14:24:14 -05:00
Roman Leventov	8277284d67	Add Checkstyle rule to force comments to classes and methods to be Javadoc comments (#4239 )	2017-05-04 11:14:41 -07:00
Roman Leventov	5e85fcc0f5	Restore BaseQuery.computeOverridenContext() for compatibility (#4241 )	2017-05-02 10:22:02 -07:00
Himanshu	5a5a2749cd	improvements to coordinator lookups management (#3855 ) * coordinator lookups mgmt improvements * revert replaces removal, deprecate it instead * convert and use older specs stored in db * more tests and updates * review comments * add behavior for 0.10.0 to 0.9.2 downgrade * incorporating more review comments * remove explicit lock and use LifecycleLock in LookupReferencesManager. use LifecycleLock in LookupCoordinatorManager as well * wip on LookupCoordinatorManager * lifecycle lock * refactor thread creation into utility method * more review comments addressed * support smooth roll back of lookup snapshots from 0.10.0 to 0.9.2 * correctly use LifecycleLock in LookupCoordinatorManager and remove synchronization from start/stop * run lookup mgmt on leader coordinator only * wip: changes to do multiple start() and stop() on LookupCoordinatorManager * lifecycleLock fix usage in LookupReferencesManagerTest * add LifecycleLock back * fix license hdr * some fixes * make LookupReferencesManager.getAllLookupsState() consistent while still being lockless * address review comments * addressing leventov's comments * address charle's comments * add IOE.java * for safety in LookupReferencesManager mainThread check for lifecycle started state on each loop in addition to interrupt * move thread creation utility method to Execs * fix names * add tests for LookupCoordinatorManager.lookupManagementLoop() * add further tests for figuring out toBeLoaded and toBeDropped on LookupCoordinatorManager * address leventov comments * remove LookupsStateWithMap and parameterize LookupsState * address review comments * address more review comments * misc fixes	2017-04-28 08:41:38 -05:00
Roman Leventov	b9fd30e90a	Add Checkstyle check to prohibit IntelliJ-style commented code lines (#4220 ) * Add Checkstyle check to prohibit IntelliJ-style commented code lines * Address comment * Restore issue link	2017-04-27 18:11:25 -07:00
kaijianding	c47cfed0ec	Significantly improve LongEncodingStrategy.AUTO build performance (#4215 ) * Significantly improve LongEncodingStrategy.AUTO build performance * use numInserted instead of tempIn.available * fix bug	2017-04-27 15:11:07 +03:00
Roman Leventov	ee9b5a619a	Fix bugs in query builders and in TimeBoundaryQuery.getFilter() (#4131 ) * Add queryMetrics property to Query interface; Fix bugs and removed unused code in Druids * Fix a bug in TimeBoundaryQuery.getFilter() and remove TimeBoundaryQuery.getDimensionsFilter() * Don't reassign query's queryMetrics if already present in CPUTimeMetricQueryRunner and MetricsEmittingQueryRunner * Add compatibility constructor to BaseQuery * Remove Query.queryMetrics property * Move nullToNoopLimitSpec() method to LimitSpec interface * Rename GroupByQuery.applyLimit() to postProcess(); Fix inconsistencies in GroupByQuery.Builder	2017-04-25 16:32:02 -05:00
kaijianding	336089563d	skip rows which are added after cursor created (#4049 ) * fix can't get dim value via IncrementalIndexStorageAdapter cursor * address the comment * add ut * address ut comments * fix bug and fix ut	2017-04-26 03:26:46 +09:00
Jonathan Wei	723a855ab9	Fix nested groupBys with outer exfns on inner numeric columns (#4182 )	2017-04-21 19:47:46 -07:00
Gian Merlino	2ca7b00346	Update versions to 0.10.1-SNAPSHOT. (#4191 )	2017-04-20 18:12:28 -07:00
Gian Merlino	60caa641f3	Restore backwards compatibility of Query. (#4185 )	2017-04-19 19:47:50 +03:00
Jihoon Son	5b69f2eff2	Make timeout behavior consistent to document (#4134 ) * Make timeout behavior consistent to document * Refactoring BlockingPool and add more methods to QueryContexts * remove unused imports * Addressed comments * Address comments * remove unused method * Make default query timeout configurable * Fix test failure * Change timeout from period to millis	2017-04-19 09:47:53 +09:00
Gian Merlino	b2954d5fea	Better groupBy error messages and docs around resource limits. (#4162 ) * Better groupBy error messages and docs around resource limits. * Fix BufferGrouper test from datasketches. * Further clarify.	2017-04-13 10:38:53 -07:00
Ram iyer	2e9589215e	removing unused var (#4163 )	2017-04-13 04:03:41 +09:00
kaijianding	676af79044	don't do postAgg in TimeseriesQueryQueryToolChest when not necessary (#4155 ) * don't do postAgg in TimeseriesQueryQueryToolChest when not necessary * set postAggs to empty list in TimeseriesQueryQueryToolChest instead of checking finalizing fn * fix ut * fix ut again	2017-04-12 15:46:46 +05:30
Roman Leventov	15f3a94474	Copy closer into Druid codebase (fixes #3652 ) (#4153 )	2017-04-10 09:38:45 +09:00
Roman Leventov	73d9b31664	GenericIndexed minor bug fixes, optimizations and refactoring (#3951 ) * Minor bug fixes in GenericIndexed; Refactor and optimize GenericIndexed; Remove some unnecessary ByteBuffer duplications in some deserialization paths; Add ZeroCopyByteArrayOutputStream * Fixes * Move GenericIndexedWriter.writeLongValueToOutputStream() and writeIntValueToOutputStream() to SerializerUtils * Move constructors * Add GenericIndexedBenchmark * Comments * Typo * Note in Javadoc that IntermediateLongSupplierSerializer, LongColumnSerializer and LongMetricColumnSerializer are thread-unsafe * Use primitive collections in IntermediateLongSupplierSerializer instead of BiMap * Optimize TableLongEncodingWriter * Add checks to SerializerUtils methods * Don't restrict byte order in SerializerUtils.writeLongToOutputStream() and writeIntToOutputStream() * Update GenericIndexedBenchmark * SerializerUtils.writeIntToOutputStream() and writeLongToOutputStream() separate for big-endian and native-endian * Add GenericIndexedBenchmark.indexOf() * More checks in methods in SerializerUtils * Use helperBuffer.arrayOffset() * Optimizations in SerializerUtils	2017-03-27 14:17:31 -05:00
Gian Merlino	dd6c0ab509	Add SQL REGEXP_EXTRACT function; add "index" to "regex" extractionFn. (#4055 ) * Add SQL REGEXP_EXTRACT function; add "index" to "regex" extractionFn. * Fix tests.	2017-03-24 17:38:36 -07:00
Erik Dubbelboer	2cbc4764f8	Comparing dimensions to each other in a filter (#3928 ) Comparing dimensions to each other using a select filter	2017-03-23 18:23:46 -07:00
Roman Leventov	4b5ae31207	QueryMetrics: abstraction layer of query metrics emitting (part of #3798 ) (#3954 ) * QueryMetrics: abstraction layer of query metrics emitting * Minor fixes * QueryMetrics.emit() for bulk emit and improve Javadoc * Fixes * Fix * Javadoc fixes * Typo * Use DefaultObjectMapper * Add tests * Address PR comments * Remove QueryMetrics.userDimensions(); Rename QueryMetric.register() to report() * Dedicated TopNQueryMetricsFactory, GroupByQueryMetricsFactory and TimeseriesQueryMetricsFactory * Typo * More elaborate Javadoc of QueryMetrics * Formatting * Replace QueryMetric enum with lambdas * Add comments and VisibleForTesting annotations	2017-03-23 17:23:59 -07:00
Jonathan Wei	79f1a1d7f0	Allow float parameters for Bound/Selector/In filters on long columns (#4074 ) * Allow float parameters for long filters * Use BigDecimal intermediate form for string->long conversions * PR comments * PR comments	2017-03-23 14:18:05 -07:00
Akash Dwivedi	ff7f90b02d	relocate method in BufferAggregator. (#4071 ) * relocate method in BufferAggregator. * Unused import. * Detailed javadoc. * using Int2ObjectMap. * batch relocate. * Revert batch relocate. * Unused import. * code comments. * code comment.	2017-03-23 13:07:59 -07:00
David Lim	f68ba4128f	Exclude pagingIdentifiers that don't apply to a datasource (#4078 ) * exclude pagingIdentifiers that don't apply to a datasource to support union datasources * code review changes * code review changes	2017-03-22 12:32:27 -07:00
Gian Merlino	1f48198607	Fix some query cache key collisions. (#4094 ) The query caches generally store dimensions and aggregators positionally, so appendCacheablesIgnoringOrder could lead to incorrect results being pulled from the cache.	2017-03-22 11:08:48 -07:00
Gian Merlino	77b6213222	Remove unused Filters.getLongValueMatcher method. (#4086 )	2017-03-21 13:46:07 -06:00
Gian Merlino	ad477cb454	Fix topNs with extractionFns but no aggregators. (#4070 ) The result sets were empty because of an aggs.length > 0 check. I'm not sure if it was there for any good reason, but there didn't seem to be one.	2017-03-20 11:31:30 -07:00
Roman Leventov	84fe91ba0b	Monomorphic processing of TopN queries with 1 and 2 aggregators (key part of #3798 ) (#3889 ) * Monomorphic processing: add HotLoopCallee, CalledFromHotLoop, RuntimeShapeInspector, SpecializationService. Specialize topN queries with 1 or 2 aggregators. Add Cursor.advanceUninterruptibly() and isDoneOrInterrupted() for exception-free query processing. * Use Execs.singleThreaded() * RuntimeShapeInspector to support nullable fields * Make CalledFromHotLoop annotation Inherited * Remove unnecessary conversion of array of ColumnSelectorPluses to list and back to array in CardinalityAggregatorFactory * Close InputStream in SpecializationService * Formatting * Test specialized PooledTopNScanners * Set flags in PooledTopNAlgorithm directly * Fix tests, dependent on CountAggragatorFactory toString() form * Fix * Revert CountAggregatorFactory changes * Implement inspectRuntimeShape() for LongWrappingDimensionSelector and FloatWrappingDimensionSelector * Remove duplicate RoaringBitmap dependency in the extendedset pom.xml * Fix * Treat ByteBuffers specially in StringRuntimeShape * Doc fix * Annotate BufferAggregator.init() with CalledFromHotLoop * Make triggerSpecializationIterationsThreshold an int * Remove SpecializationService.PerPrototypeClassState.of() * Add comments * Limit the amount of specializations that SpecializationService could make * Add default implementation for BufferAggregator.inspectRuntimeShape(), for compatibility with extensions * Use more efficient ConcurrentMap's idioms in SpecializationService	2017-03-17 14:44:36 -05:00
Gian Merlino	3ec1877887	Fix BucketExtractionFn on objects that are strings. (#4072 )	2017-03-16 22:59:11 -07:00
Charles Allen	805d85afda	Allow compilation as Java8 source and target (#3328 ) * Allow compilation as Java8 source and target for everything except API * Remove conditions in tests which assume that we may run with Java 7 * Update easymock to 3.4 * Make Animal Sniffer to check Java 1.8 usage; remove redundant druid-caffeine-cache configuration * Use try-with-resources in LargeColumnSupportedComplexColumnSerializerTest.testSanity() * Remove java7 special for druid-api	2017-03-14 22:23:47 -06:00
Gian Merlino	e5c0dab12c	groupBy v2: Better error message when resources are exhausted. (#4046 ) * groupBy v2: Better error message when resources are exhausted. Fixes #4043. * Fix tests.	2017-03-15 00:37:49 +05:30
Jihoon Son	dfe4bda7fd	add doc (#4030 )	2017-03-10 12:49:20 -08:00
Gian Merlino	a5170666b6	groupBy v2: Always merge queries. (#4023 ) This fixes #4020 because it means the timestamp will always be included for outermost queries. Historicals receiving queries from older brokers will think they're outermost (because CTX_KEY_OUTERMOST isn't set to "false"), so they'll include a timestamp, so the older brokers will be OK.	2017-03-08 12:47:46 -06:00
Gian Merlino	4ca5270e88	Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods. (#4004 ) * Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods. Includes two fixes: - groupBy v2 now ignores chunkPeriod, since it wouldn't have helped anyway (its mergeResults returns a lazy sequence) and it generates incorrect results. - Fix chunkPeriod handling for periods of irregular length, like "P1M" or "P1Y". Also includes doc and test fixes: - groupBy v1 was no longer being tested by GroupByQueryRunnerTest since #3953, now it is once again. - chunkPeriod documentation was misleading due to its checkered past. Updated it to be more accurate. * Remove unused import. * Restore buffer size.	2017-03-06 12:27:02 -06:00
Gian Merlino	7b9e6c29cd	Fix float, long dimension indexer object selectors. (#4012 ) Their "convertUnsortedEncodedKeyComponentToActualArrayOrList" methods didn't respect the contract, which says they should return single values (not array/list) if there is only a single value to return. This affects the behavior of ObjectColumnSelectors on realtime segments.	2017-03-06 10:01:30 -08:00
Gian Merlino	337f3870d8	Fix TimeFormatExtractionFn getCacheKey when tz, locale are not provided. (#4007 ) * Fix TimeFormatExtractionFn getCacheKey when tz, locale are not provided. * Remove unused import. * Use defaults in cache key.	2017-03-04 17:41:59 -08:00
praveev	67d0ae3271	Let toDateTime call fall through for Duration Granularity (#4001 ) * Let toDateTime call fall through for Duration Granularity Added test for the same. * Add duration granularity test to GroupByQueryRunnerTest	2017-03-03 13:27:22 -06:00
Himanshu	e7e3c2dc5a	support singleThreaded flag for groupBy-v2 as well (#3992 )	2017-03-03 23:43:06 +05:30
Roman Leventov	81a5f9851f	TmpFileIOPeons to create files under the merging output directory, instead of java.io.tmpdir (#3990 ) * In IndexMerger and IndexMergerV9, create temporary files under the output directory/tmpPeonFiles, instead of java.io.tmpdir * Use FileUtils.forceMkdir() across the codebase and remove some unused code * Fix test * Fix PullDependencies.run() * Unused import	2017-03-02 14:05:12 -08:00
Jonathan Wei	5fb1638534	Add default configuration for select query 'fromNext' parameter (#3986 ) * Add default configuration for select query 'fromNext' parameter * PR comments * Fix PagingSpec config injection * Injection fix for test	2017-03-01 17:05:35 -08:00
Himanshu	8316b4f48f	fix TimeDimExtractionFn.apply() under concurrency (#3984 )	2017-03-01 13:07:12 -08:00
kaijianding	772de66e79	add filenameBase to log when exceed file size limit to indicate which column it is (#3982 )	2017-03-01 13:05:07 -08:00
Gian Merlino	cc20133e70	Checkstyle rule to outlaw tabs. (#3988 ) Tabs are the worst.	2017-02-28 23:52:53 -08:00
Akash Dwivedi	91344cbe57	Enable GenericIndexed V2 for built-in(druid-io managed) complex columns. (#3987 ) * Enable GenericIndexed V2 for complex columns. * SerializerBuilder to use GenericColumnSerializer.	2017-02-28 22:06:54 -08:00
Jonathan Wei	a08660a9ca	Support ingestion of long/float dimensions (#3966 ) * Support ingestion for long/float dimensions * Allow non-arrays for key components in indexing type strategy interfaces * Add numeric index merge test, fixes * Docs for numeric dims at ingestion * Remove unused import * Adjust docs, add aggregate on numeric dims tests * remove unused imports * Throw exception for bitmap method on numerics * Move typed selector creation to DimensionIndexer interface * unused imports * Fix * Remove unused DimensionSpec from indexer methods, check for dims first in inc index storage adapter * Remove spaces	2017-02-28 19:04:41 -08:00
praveev	5ccfdcc48b	Fix testDeadlock timeout delay (#3979 ) * No more singleton. Reduce iterations * Granularities * Fix the delay in the test * Add license header * Remove unused imports * Lot more unused imports from all the rearranging * CR feedback * Move javadoc to constructor	2017-02-28 12:51:41 -06:00
praveev	c3bf40108d	One granularity (#3850 ) * Refactor Segment Granularity * Beginning of one granularity * Copy the fix for custom periods in segment-grunalrity over here. * Remove the custom serialization for now. * Compilation cleanup * Reformat code * Fixing unit tests * Unify to use a single iterable * Backward compatibility for rolling upgrade * Minor check style. Cosmetic changes. * Rename length and millis to duration * CR feedback * Minor changes.	2017-02-25 01:02:29 -06:00
Jonathan Wei	58b704c3b4	Don't allow '__time' as a GroupBy output field name (#3967 ) * Don't allow '__time' as a GroupBy column field name * Tweak exception message	2017-02-23 14:39:17 -08:00
kaijianding	7ce05d58bc	fix NPE in search query when dimension contains null value (#3968 ) * fix NPE when dimension contains null value in search query * add ut * search with not existed dimension should always return empty result	2017-02-23 08:07:59 -08:00
Gian Merlino	372b84991c	Add virtual columns to timeseries, topN, and groupBy. (#3941 ) * Add virtual columns to timeseries, topN, and groupBy. * Fix GroupByTimeseriesQueryRunnerTest. * Updates from review comments.	2017-02-22 13:16:48 -08:00
Jihoon Son	7200dce112	Atomic merge buffer acquisition for groupBys (#3939 ) * Atomic merge buffer acquisition for groupBys * documentation * documentation * address comments * address comments * fix test failure * Addressed comments - Add InsufficientResourcesException - Renamed GroupByQueryBrokerResource to GroupByQueryResource * addressed comments * Add takeBatch() to BlockingPool	2017-02-22 14:49:37 -06:00
Gian Merlino	985203b634	Finalize fields in postaggs (#3957 ) * initial commits for finalizeFieldAccess #2433 * fix some bugs to run a query * change name of method Queries.verifyAggregations to Queries.prepareAggregations * add Uts * fix Ut failures * rebased to master * address comments and add a Ut for arithmetic post aggregators * rebased to the master * address the comment of injection within arithmetic post aggregator * address comments and introduce decorate() in the PostAggregator interface. * Address comments. 1. Implements getComparator in FinalizingFieldAccessPostAggregator and add Uts for it 2. Some minor changes like renaming a method name. * Fix a code style mismatch. * Rebased to the master	2017-02-21 16:32:14 -08:00
Gian Merlino	a47206eaf8	Ability to filter on virtual columns. (#3942 ) This didn't need much other than having BitmapIndexSelector return null from various methods to trigger cursor based filtering.	2017-02-21 16:03:31 -08:00
Jihoon Son	128274c6f0	Disable caching on brokers for groupBy v2 (#3950 ) * Disable caching on brokers for groupBy v2 * Rename parameter * address comments	2017-02-21 09:49:49 -08:00
Jonathan Wei	bc33b68b51	Use GroupBy V2 as default (#3953 ) * Use GroupBy V2 as default * Remove unused line * Change assert to exception propagation	2017-02-18 07:40:40 -08:00
kaijianding	361d9d9802	fix dynamic schema data can't rollup correctly (#3949 ) * fix dynamic schema data can't rollup correctly * add ut	2017-02-17 15:07:29 -06:00
Akash Dwivedi	797488a677	Removing Integer.MAX column size limit. (#3743 ) * Removing Integer.MAX column size limit. * On demand creation of headerLong, use v2 instead of v3 * Avoid reusing the same object from a previous test. * Avoid reusing the same object from a previous test part#2 * code formatting. * GenericIndexed/Writer code review changes. * GenericIndexed/writer code review requested changes. * checkIndex() to static * native endianess for genericIndexedV2, code review requested changes. * Formatting * Hll fix. * use native endianess during bag size calculation. * Code review requested changes. * IOPeon close() changes. * use different tmp directory path for testing. * Code review requested changes.	2017-02-16 20:09:43 -06:00
Jihoon Son	a459db68b6	Fine grained buffer management for groupby (#3863 ) * Fine-grained buffer management for group by queries * Remove maxQueryCount from GroupByRules * Fix code style * Merge master * Fix compilation failure * Address comments * Address comments - Revert Sequence - Add isInitialized() to Grouper - Initialize the grouper in RowBasedGrouperHelper.Accumulator - Simple refactoring RowBasedGrouperHelper.Accumulator - Add tests for checking the number of used merge buffers - Improve docs * Revert unnecessary changes * change to visible to testing * fix misspelling	2017-02-14 12:55:54 -08:00
Gian Merlino	af67e8904e	PreComputedHyperUniquesSerde: Fix formatting. (#3932 )	2017-02-14 09:32:29 -08:00
DaimonPl	a2875a4d91	pre-computed HLL support for hyperUnique aggregator (#3909 )	2017-02-13 15:26:20 -08:00
Akash Dwivedi	8854ce018e	File.deleteOnExit() (#3923 ) * Less use of File.deleteOnExit() * removed deleteOnExit from most of the tests/benchmarks/iopeon * Made IOpeon closable * Formatting. * Revert DeterminePartitionsJobTest, remove cleanup method from IOPeon	2017-02-13 15:12:14 -08:00
Himanshu	9dfcf0763a	disable javascript execution by default (#3818 )	2017-02-13 15:11:18 -08:00
Pierre	9ab9feced6	Close all aggregators when closing onHeapIncrementalIndex (#3926 ) * Close all aggregators when closing onHeapIncrementalIndex * Aggregators are now handled as Closeables, remove unnecessary mock in test * Fix variable shadowing	2017-02-13 15:01:27 -08:00
Jihoon Son	991e2852da	Add PostAggregators to generator cache keys for top-n queries (#3899 ) * Add PostAggregators to generator cache keys for top-n queries * Add tests for strings * Remove debug comments * Add type keys and list sizes to cache key * Make post aggregators used for sort are considered for cache key generation * Use assertArrayEquals() * Improve findPostAggregatorsForSort() * Address comments * fix test failure * address comments	2017-02-13 12:23:44 -08:00
Parag Jain	33c635aff2	use as() method of base segment in reference counting segment (#3921 )	2017-02-09 20:24:47 -06:00
Jonathan Wei	ca2b04f0fd	Add long/float ColumnSelectorStrategy implementations (#3838 ) * Add long/float ColumnSelectorStrategy implementations * Address PR comments * Add String strategy with internal dictionary to V2 groupby, remove dict from numeric wrapping selectors, more tests * PR comments * Use BaseSingleValueDimensionSelector for long/float wrapping * remove unused import * Address PR comments * PR comments * PR comments * More PR comments * Fix failing calcite histogram subquery tests * ScanQuery test and comment about isInputRaw * Add outputType to extractionDimensionSpec, tweak SQL tests * Fix limit spec optimization for numerics * Add cardinality sanity checks to TopN * Fix import from merge * Add tests for filtered dimension spec outputType * Address PR comments * Allow filtered dimspecs on numerics * More comments	2017-02-08 20:39:29 -08:00
Gian Merlino	97765fdfef	Simplify LikeFilter implementation of getBitmapIndex, estimateSelectivity. (#3910 ) * Simplify LikeFilter implementation of getBitmapIndex, estimateSelectivity. LikeFilter: - Reduce code duplication, and simplify methods, at the cost of incurring an extra box of ImmutableBitmap into a SingletonImmutableList. I think this is fine, since this should be cheap and the code path is not hot (just once per filter). Filters: - Make estimateSelectivity public since it seems intended that they be used by Filter implementations, and Filters from extensions may want to use them too. Removed @VisibleForTesting for the same reason. - Rename one of the estimatePredicateSelectivity overloads to estimateSelectivity, since predicates aren't involved. * Address PR comments. * Remove unused import * Change List to Collection	2017-02-08 13:46:01 -06:00
Gian Merlino	12317fd001	Bump version to 0.10.0-SNAPSHOT. (#3913 )	2017-02-06 17:54:35 -08:00
Jihoon Son	ddd8c9ef97	Add filter selectivity estimation for auto search strategy (#3848 ) * Add filter selectivity estimation for auto search strategy * Addressed comments * Lazy bitmap materialization for bitmap sampling and java docs * Addressed comments. - Fix wrong non-overlap ratio computation and added unit tests. - Change Iterable<Integer> to IntIterable - Remove unnecessary Iterable<Integer> * Addressed comments - Split a long ternary operation into if-else blocks - Add IntListUtils.fromTo() * Fix test failure and add a test for RangeIntList * fix code style * Diabled selectivity estimation for multi-valued dimensions * Address comment	2017-02-06 11:15:03 -08:00
Parag Jain	8a13a85765	Introduce SegmentizerFactory (#3901 ) * Introduce SegmentizerFactory - that knows how to deserialize specific type of segment - Default implementation is MMappedQueryableSegmentizerFactory which creates QueryableIndexSegment - Unit test for the default behavior * review comments	2017-02-06 10:05:12 -08:00
DaimonPl	93b71e265e	Extract HLL related code to separate module (#3900 )	2017-02-03 09:45:11 -08:00
Jonathan Wei	182261f713	Allow configurable temp directory for query processing (#3893 )	2017-02-02 10:22:28 -08:00
Jonathan Wei	e6b95e80aa	Remove deprecated Aggregator/AggregatorFactory methods (#3894 )	2017-02-01 14:43:18 -08:00
Gian Merlino	d3a3b7ba0c	Add virtual column types, holder serde, and safety features. (#3823 ) * Add virtual column types, holder serde, and safety features. Virtual columns: - add long, float, dimension selectors - put cache IDs in VirtualColumnCacheHelper - adjust serde so VirtualColumns can be the holder object for Jackson - add fail-fast validation for cycle detection and duplicates - add expression virtual column in core Storage adapters: - move virtual column hooks before checking base columns, to prevent surprises when a new base column is added that happens to have the same name as a virtual column. * Fix ExtractionDimensionSpecs with virtual dimensions. * Fix unused imports. * CR comments * Merge one more time, with feeling.	2017-01-26 18:15:51 -08:00
Roman Leventov	75d9e5e7a7	DimensionSelector-related bug fixes and optimizations (fixes #3799 , part of #3798 ) (#3858 ) * * Add DimensionSelector.idLookup() and nameLookupPossibleInAdvance() to allow better inspection of features DimensionSelectors supports, and safer code working with DimensionSelectors in BaseTopNAlgorithm, BaseFilteredDimensionSpec, DimensionSelectorUtils; * Add PredicateFilteringDimensionSelector, to make BaseFilteredDimensionSpec to be able to decorate DimensionSelectors with unknown cardinality; * Add DimensionSelector.makeValueMatcher() (two kinds) for DimensionSelector-side specifics-aware optimization of ValueMatchers; * Optimize getRow() in BaseFilteredDimensionSpec's DimensionSelector, StringDimensionIndexer's DimensionSelector and SingleScanTimeDimSelector; * Use two static singletons, TrueValueMatcher and FalseValueMatcher, instead of BooleanValueMatcher; * Add NullStringObjectColumnSelector singleton and use it in MapVirtualColumn * Rename DimensionSelectorUtils.makeNonDictionaryEncodedIndexedIntsBasedValueMatcher to makeNonDictionaryEncodedRowBasedValueMatcher * Make ArrayBasedIndexedInts constructor private, replace it's usages with of() static factory method * Cache baseIdLookup in ForwardingFilteredDimensionSelector * Fix a bug in DimensionSelectorUtils.makeRowBasedValueMatcher(selector, predicate, matchNull) * Employ precomputed BitSet optimization in DimensionSelector.makeValueMatcher(value, matchNull) when lookupId() is not available, but cardinality is known and lookupName() is available * Doc fixes * Addressed comments * Fix * Fix * Adjust javadoc of DimensionSelector.nameLookupPossibleInAdvance() for SingleScanTimeDimSelector * throw UnsupportedOperationException instead of IAE in BaseTopNAlgorithm	2017-01-25 15:28:27 -08:00
Gian Merlino	3136dfa421	LikeFilter: Read value lazily when doing a prefix-based match. (#3880 ) This speeds up cases where we don't actually need to read the value, such as "LIKE 'foo%'".	2017-01-25 13:22:07 -08:00
Roman Leventov	af93a8d189	Sequences refactorings and removed unused code (part of #3798 ) (#3693 ) * Removing unused code from io.druid.java.util.common.guava package; fix #3563 (more consistent and paranoiac resource handing in Sequences subsystem); Add Sequences.wrap() for DRY in MetricsEmittingQueryRunner, CPUTimeMetricQueryRunner and SpecificSegmentQueryRunner; Catch MissingSegmentsException in SpecificSegmentQueryRunner's yielder.next() method (follow up on #3617) * Make Sequences.withEffect() execute the effect if the wrapped sequence throws exception from close() * Fix strange code in MetricsEmittingQueryRunner * Add comment on why YieldingSequenceBase is used in Sequences.withEffect() * Use Closer in OrderedMergeSequence and MergeSequence to close multiple yielders	2017-01-19 20:07:43 -08:00
kaijianding	33ae9dd485	streaming version of select query (#3307 ) * streaming version of select query * use columns instead of dimensions and metrics;prepare for valueVector;remove granularity * respect query limit within historical * use constant * fix thread name corrupted bug when using jetty qtp thread rather than processing thread while working with SpecificSegmentQueryRunner * add some test for scan query * add scan query document * fix merge conflicts * add compactedList resultFormat, this format is better for json ser/der * respect query timeout * respect query limit on broker * use static consts and remove unused code	2017-01-19 16:09:53 -06:00
Slim	558dc365a4	renaming classes to be run by mvn and comment non operational tests (#3847 )	2017-01-17 11:59:12 -08:00
Akash Dwivedi	dd0c4e2ead	Migrating extendedset from Metamarkets. (#3694 ) * Migrating extendedset from Metamarkets. * Notice change * More details in NOTICE * NOTICE formatting. * suppress header checkstlye for extendedset.	2017-01-17 10:10:27 -08:00
Gian Merlino	e86859b228	SQL support for nested groupBys. (#3806 ) * SQL support for nested groupBys. Allows, for example, doing exact count distinct by writing: SELECT COUNT() FROM (SELECT DISTINCT col FROM druid.foo) Contrast with approximate count distinct, which is: SELECT COUNT(DISTINCT col) FROM druid.foo Add deeply-nested groupBy docs, tests, and maxQueryCount config. * Extract magic constants into statics. * Rework rules to put preconditions in the "matches" method.	2017-01-11 18:32:53 -08:00
Jihoon Son	d80bec83cc	Enable auto license checking (#3836 ) * Enable license checking * Clean duplicated license headers	2017-01-10 18:13:47 -08:00
Jihoon Son	c099977a5b	Add an option to SearchQuery to choose a search query execution strategy (#3792 ) * Add an option to SearchQuery to choose a search query execution strategy. Supported strategies are 1) Index-only query execution 2) Cursor-based scan 3) Auto: choose an efficient strategy for a given query * Add SearchStrategy and SearchQueryExecutor * Address comments * Rename strategies and set UseIndexesStrategy as the default strategy * Add a cost-based planner for auto strategy * Add document * Fix code style * apply code style * apply comments	2017-01-10 18:04:20 -08:00
Gian Merlino	3c63cff57a	Remove makeMathExpressionSelector from ColumnSelectorFactory. (#3815 ) * Remove makeMathExpressionSelector from ColumnSelectorFactory. * Add @Nullable annotations in places, fix Number.class check. * Break up createBindings, add tests. * Add null check.	2017-01-05 18:06:38 -08:00
Gian Merlino	220ca7ebb6	Ignore DimFilterHavingSpec testConcurrentUsage. (#3814 )	2017-01-03 17:43:58 -07:00
Gian Merlino	d8702ebece	Filters: Use ColumnSelectorFactory directly for building row-based matchers. (#3797 ) * Filters: Use ColumnSelectorFactory directly for building row-based matchers. * Adjustments based on code review. - BoundDimFilter: fewer volatiles, rename matchesAnything to !matchesNothing. - HavingSpecs: Clarify that they are not thread-safe, and make DimFilterHavingSpec not thread safe. - Renamed rowType to rowSignature. - Added specializations for time-based vs non-time-based DimensionSelector in RBCSF. - Added convenience method DimensionHanderUtils.createColumnSelectorPlus. - Added singleton ZeroIndexedInts. - Added test cases for DimFilterHavingSpec. * Make ValueMatcherColumnSelectorStrategy actually use the associated selector. * Add RangeIndexedInts. * DimFilterHavingSpec: Fix concurrent usage guard on jdk7. * Add assertion to ZeroIndexedInts. * Rename no-longer-volatile members.	2017-01-03 14:30:22 -08:00
Roman Leventov	33800122ad	Don't return leaked Objects back to StupidPool, because this is dangerous. Reuse Cleaners in StupidPool. Make StupidPools named. Add StupidPool.leakedObjectCount(). Minor fixes (#3631 )	2016-12-26 00:35:35 -06:00
Jonathan Wei	0e5bd8b4d4	Add dimension type-based interface for query processing (#3570 ) * Add dimension type-based interface for query processing * PR comment changes * Address PR comments * Use getters for QueryDimensionInfo * Split DimensionQueryHelper into base interface and query-specific interfaces * Treat empty rows as nulls in v2 groupby * Reduce boxing in SearchQueryRunner * Add GroupBy empty row handling to MultiValuedDimensionTest * Address PR comments * PR comments and refactoring * More PR comments * PR comments	2016-12-21 20:11:37 -07:00
Jonathan Wei	2bfcc8a592	First and Last Aggregator (#3566 ) * add first and last aggregator * add test and fix * moving around * separate aggregator valueType * address PR comment * add finalize inner query and adjust v1 inner indexing * better test and fixes * java-util import fixes * PR comments * Add first/last aggs to ITWikipediaQueryTest	2016-12-16 15:26:40 -08:00
Himanshu	ed322a4beb	remove size from default analysisTypes list for segmentMetadata query (#3773 )	2016-12-13 18:01:21 -08:00
Jonathan Wei	880a021a7a	Fix missed travis failures from PR 3567 and 2798 (#3761 ) * Fix checkstyle failures from PR 3567 * Fix GranularityPathSpecTest compile failure	2016-12-07 19:07:31 -08:00
Erik Dubbelboer	bb9e35e1af	Add Greatest and Least post aggregations (#3567 )	2016-12-07 17:58:23 -08:00
Roman Leventov	dc8f814acc	Optimize Iterator<ImmutableBitmap> implementation inside Filters.matchPredicate() so that it doesn't emit empty bitmap in the end of the iteration, and make it to follow Iterator contract, that is throw NoSuchElementException from next() if there are no more bitmaps (#3754 )	2016-12-07 12:54:09 -08:00
Jonathan Wei	d1896a2d62	Disable flush after every ObjectMapper write (#3748 )	2016-12-06 16:45:23 -08:00
Gian Merlino	b1bac9f2d3	groupBy v2: Ignore timestamp completely when granularity = all, except for the final merge. (#3740 ) * GroupByBenchmark: Add serde, spilling, all-gran benchmarks. Also use more iterations. * groupBy v2: Ignore timestamp completely when granularity = all, except for the final merge. Specifically: - Remove timestamp from RowBasedKey when not needed - Set timestamp to null in MapBasedRows that are not part of the final merge.	2016-12-06 16:17:32 -08:00
Himanshu	45da7e48f1	groupBy sort results by (dimensions,timestamp) instead of (timestamp,dimension) (#3672 ) * sortByDimsFirst flag for groupBy query * Remove need for KeyType in Grouper<KeyType> to be Comparable<KeyType> * fix review comments * fix review comments regarding removing code duplication of dim/time comparison * move comparator for KeyType object to KeySerdeFactory so that creation of comparator does not need KeySerde * remove unnecessary system.out.println * make access static var NATURAL_NULLS_FIRST directly * further review comments addressing	2016-12-06 09:48:56 -08:00
Navis Ryu	c74d267f50	Support virtual column for select query (#2511 ) * Support virtual column for select query * Addressed comments	2016-12-05 15:14:35 -08:00
Gian Merlino	b64e06704e	Fix SingleScanTimeDimSelector when an extractionFn returns null for a timestamp. (#3732 )	2016-12-02 15:27:54 -08:00
Gian Merlino	f4cc8c2b2f	IndexBuilder: Close IncrementalIndex when done. (#3734 )	2016-12-02 16:56:34 -06:00
Gian Merlino	353fee79dd	Add "asMillis" option to "timeFormat" extractionFn. (#3733 ) This is useful for chaining extractionFns that all want to treat time as millis, such as having a javascript extractionFn after a timeFormat.	2016-12-02 13:45:16 -08:00
Gian Merlino	102375d9bb	Add "strlen" extractionFn. (#3731 )	2016-12-02 12:08:51 -08:00
Gian Merlino	4c5d10f8a3	Add DimFilterHavingSpec. (#3727 ) * Add DimFilterHavingSpec. * Add test for DimFilterHavingSpec with extractionFns.	2016-12-02 10:04:30 -08:00
Gian Merlino	68735829ca	Add, fix equals, hashCode, toString on various classes. (#3723 ) * TimeFormatExtractionFn: Add toString. * InDimFilter: Add toString, allow accepting any Collection of values. * DimensionTopNMetricSpec: Fix toString. * InvertedTopNMetricSpec: Add toString. * HyperUniqueFinalizingPostAggregator: Add equals, hashCode, toString.	2016-11-30 19:00:14 -08:00
Gian Merlino	477e0cab7c	Filter fixes and tests (#3724 ) * More robust Filter tests. All Filter tests now exercise the CNF and post-filtering features. * Fixes to RowBasedValueMatcherFactory and to bound filters. - Change Comparables to Strings in ValueMatcher related code. - Break out RowBasedValueMatcherFactory, fix a variety of issues around nulls, and add tests. - Fix bound filters on long columns with non-numeric bounds, and add tests.	2016-11-30 16:10:05 -08:00
Gian Merlino	6922d684bf	GroupBy: Validation of output names, and a gross hack for v1 subqueries. (#3686 ) v1 subqueries try to use aggregators to "transfer" values from the inner results to an incremental index, but aggregators can't transfer all kinds of values (strings are a common one). This is a workaround that selectively ignores what the outer aggregators ask for and instead assumes that we know best. These are in the same commit because the name validation changed the kinds of errors that were thrown by v1 subqueries.	2016-11-29 12:35:03 +05:30
Roman Leventov	c070b4a816	Fix concurrency defects, remove unnecessary volatiles (#3701 )	2016-11-22 16:42:28 -08:00
Roman Leventov	7b56cec3b9	Fix resource leaks (#3702 )	2016-11-18 21:21:36 +05:30
Gian Merlino	7e80d1045a	Exercise v2 engine in the groupBy aggregator and multi-value dimension tests. (#3698 ) This also involved some other test changes: - Added a factory.mergeRunners step to AggregationTestHelper's groupBy chain, since the v2 engine does merging there. - Changed test byteBuffer pools from on-heap to off-heap to work around https://github.com/DataSketches/sketches-core/pull/116 for datasketches tests.	2016-11-16 20:02:25 -08:00
Keuntae Park	094f5b851b	Support Min/Max for Timestamp (#3299 ) * Min/Max aggregator for Timestamp * remove unused imports and method * rebase and zip the test data * add docs	2016-11-14 23:00:21 -08:00
Gian Merlino	9ad34a3f03	groupBy v1: Force all dimensions to strings. (#3685 ) Fixes #3683.	2016-11-14 09:30:18 -08:00
Jisoo Kim	7c0f462fbc	fix bug in StringDimensionHandler and add a cli tool for validating segments (#3666 )	2016-11-11 18:46:25 -08:00
Roman Leventov	fbbb55f867	Update emitter dependency to 0.4.0 and emit "version" dimension for all druid metrics (#3679 ) * Update emitter dependency to 0.4.0 and emit "version" dimension for all druid metrics, not only query metrics * Remove unused imports * Use empty string instead of "testing-version" as a version placeholder	2016-11-11 17:17:27 -06:00
Akash Dwivedi	3e408497b3	Migrating bytebuffercollections from Metamarkets. (#3647 ) * Migrating bytebuffercollections from Metamarkets. * resolving code conflicts and removing <p> from bytebuffer-collections.	2016-11-11 10:51:07 -08:00
Gian Merlino	fd5451486c	Short-circuiting AndFilter. (#3676 ) If any of the bitmaps are empty, the result will be false.	2016-11-11 10:14:56 -08:00
Gian Merlino	657e4512d2	Checkstyle checks for AvoidStaticImport, UnusedImports. (#3660 ) Excludes tests from AvoidStaticImport, since those are used often there and I didn't want to make this changeset too large. Production code use was minimal and I switched those to non-static imports.	2016-11-05 11:34:36 -07:00
Gian Merlino	4cbebd0931	SubstringDimExtractionFn, BoundDimFilter: Implement typical style toString. (#3658 )	2016-11-04 13:31:47 -07:00
Gian Merlino	600bbd4a17	BucketExtractionFn: Implement hashCode, fix toString. (#3656 )	2016-11-04 11:24:02 -07:00
Gian Merlino	8b3c86f41f	Fix FilteredAggregatorFactory toString formatting. (#3657 )	2016-11-04 11:23:55 -07:00
Gian Merlino	2c504b6258	Add "like" filter. (#3642 ) * Add "like" filter. * Addressed some PR comments. * Slight simplifications to LikeFilter. * Additional simplifications. * Fix comment in LikeFilter. * Clarify comment in LikeFilter. * Simplify LikeMatcher a bit. * No use going through the optimized path if prefix is empty. * Add more tests.	2016-11-04 23:25:03 +05:30
Navis Ryu	b99e14e732	Support configuration for handling multi-valued dimension (#2541 ) * Support configuration for handling multi-valued dimension * Addressed comments * use MultiValueHandling.ofDefault() for missing policy	2016-11-03 22:38:54 -06:00
Navis Ryu	e10def32f2	Support string type in math expression (#2836 ) * Support string type in math expression addressed comments addressed comments Addressed comments * Updated math function document * Addressed comments	2016-11-02 21:10:48 -06:00
kaijianding	2961406b90	fix zero period in PeriodGranularity causing gran.iterable(start, end) infinite loop (#3644 )	2016-11-02 15:40:07 +05:30
Roman Leventov	4b0d6cf789	Fix resource leaks (ComplexColumn and GenericColumn) (#3629 ) * Remove unused ComplexColumnImpl class * Remove throws IOException from close() in GenericColumn, ComplexColumn, IndexedFloats and IndexedLongs * Use concise try-with-resources syntax in several places * Fix resource leaks (ComplexColumn and GenericColumn) in SegmentAnalyzer, SearchQueryRunner, QueryableIndexIndexableAdapter and QueryableIndexStorageAdapter * Use Closer in Iterable, returned from QueryableIndexIndexableAdapter.getRows(), in order to try to close everything even if closing some parts thew exceptions	2016-11-02 09:23:52 +05:30
Gian Merlino	45940d6e40	Math expressions support for missing columns. (#3630 ) Also add SchemaEvolutionTest to help test this kind of thing. Fixes #3627 and includes test for #3625.	2016-11-01 09:40:25 -07:00
Gian Merlino	89d9c61894	Deprecate Aggregator.getName and AggregatorFactory.getAggregatorStartValue. (#3572 )	2016-10-31 15:24:30 -07:00
Navis Ryu	3fca3be9ea	SpecificSegmentQueryRunner misses missing segments from toYielder() (#3617 )	2016-10-30 11:47:29 -07:00
Himanshu	23a8e22836	fix SketchMergeAggregatorFactory.finalizeResults, comparator and more UTs for timeseries, topN (#3613 )	2016-10-28 15:48:33 -07:00
Navis Ryu	898c1c21af	More best-effort parse long (#3603 ) * More best-effort parse long * addressed comments	2016-10-25 10:31:51 -07:00
Akash Dwivedi	4b3bd8bd63	Migrating java-util from Metamarkets. (#3585 ) * Migrating java-util from Metamarkets. * checkstyle and updated license on java-util files. * Removed unused imports from whole project. * cherry pick metamx/java-util@826021f. * Copyright changes on java-util pom, address review comments.	2016-10-21 14:57:07 -07:00
Navis Ryu	8b7ff4409a	Math expressional parameters for aggregator (#2783 ) * Supports expression-paramed aggregator (squashed and rebased on master) also includes math post aggregator (was #2820) * Addressed comments * addressed comments	2016-10-19 13:58:35 -05:00
Roman Leventov	b113a34355	In CPUTimeMetricQueryRunner, account CPU consumed in baseSequence.toYielder() (#3587 )	2016-10-18 09:06:42 -05:00
Charles Allen	2c5c8198db	Make query/cpu/time still report on error (#3535 )	2016-10-18 08:26:21 -05:00
Roman Leventov	9611358f0a	Small topn scan improvements (#3526 ) * Remove unused numProcessed param from PooledTopNAlgorithm.aggregateDimValue() * Replace AtomicInteger with simple int in PooledTopNAlgorithm.scanAndAggregate() and aggregateDimValue() * Remove unused import	2016-10-17 10:36:19 -07:00
Gian Merlino	285516bede	Workaround non-thread-safe use of HLL aggregators. (#3578 ) Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and "get" methods can be called simultaneously by OnheapIncrementalIndex, since its "doAggregate" and "getMetricObjectValue" methods are not synchronized. This means that the optimization of HyperLogLogCollector.fold in #3314 (saving and restoring position rather than duplicating the storage buffer of the right-hand side) could cause corruption in the face of concurrent writes. This patch works around the issue by duplicating the storage buffer in "get" before returning a collector. The returned collector still shares data with the original one, but the situation is no worse than before #3314. In the future we may want to consider making a thread safe version of HLLC that avoids these kinds of problems in realtime indexing. But for now I thought it was best to do a small change that restored the old behavior.	2016-10-17 09:39:12 -07:00
Roman Leventov	5dc95389f7	Add Checkstyle framework (#3551 ) * Add Checkstyle framework * Avoid star import * Need braces for control flow statements * Redundant imports * Add NewLineAtEndOfFile check	2016-10-13 13:37:47 -07:00
Roman Leventov	85ac8eff90	Improve performance of IndexMergerV9 (#3440 ) * Improve performance of StringDimensionMergerV9 and StringDimensionMergerLegacy by avoiding primitive int boxing by using IntIterator in IndexedInts instead of Iterator<Integer>; Extract some common logic for V9 and Legacy mergers; Minor improvements to resource handling in StringDimensionMergerV9 * Don't mask index in MergeIntIterator.makeQueueElement() * DRY conversion RoaringBitmap's IntIterator to fastutil's IntIterator * Do implement skip(n) in IntIterators extending AbstractIntIterator because original implementation is not reliable * Use Test(expected=Exception.class) instead of try { } catch (Exception e) { /* ignore */ }	2016-10-13 08:28:46 -07:00
Charles Allen	76e77cb610	Make segment creation gauva 14 friendly (#3520 )	2016-10-05 15:25:03 -07:00
Gian Merlino	40f2fe7893	Bump versions to 0.9.3-SNAPSHOT (#3524 )	2016-09-29 13:53:32 -07:00
Charles Allen	654e1db309	Add simple test to FunctionalExtractionTest (#3522 )	2016-09-28 23:45:15 -07:00
Gian Merlino	d5a8a35fec	groupBy: GroupByRowProcessor fixes, invert subquery context overrides. (#3502 ) - Fix GroupByRowProcessor config overrides - Fix GroupByRowProcessor resource limit checking - Invert subquery context overrides such that for the subquery, its own keys override keys from the outer query, not the other way around. The last bit is necessary for the test to work, and seems like a better way to do it anyway.	2016-09-23 14:41:09 -07:00
Gian Merlino	7195be32d8	groupBy v2: Fix dangling references. (#3500 ) Acquiring references in the processing task prevents dangling references caused by canceled processing tasks.	2016-09-24 01:59:11 +05:30
Gian Merlino	f8d71fc602	groupBy: Fix maxMergingDictionarySize config. (#3488 )	2016-09-22 10:02:33 -07:00
Gian Merlino	c87ecea975	Fix ListFilteredDimensionSpec blacklisting on non-present values. (#3487 )	2016-09-22 09:12:02 -07:00
Navis Ryu	49c0fe0e8b	Show candidate hosts for the given query (#2282 ) * Show candidate hosts for the given query * Added test cases & minor changes to address comments * Changed path-param to query-pram for intervals/numCandidates	2016-09-22 08:32:38 -07:00
Keuntae Park	54ec4dd584	Support renaming of outputName for cached select and search query results (#3395 ) * support renaming of outputName for cached select and search queries * rebase and resolve conflicts * rollback CacheStrategy interface change * updated based on review comments	2016-09-20 08:19:14 -07:00
Charles Allen	95e08b38ea	[QTL] Reduced Locking Lookups (#3071 ) * Lockless lookups * Fix compile problem * Make stack trace throw instead * Remove non-germane change * * Add better naming to cache keys. Makes logging nicer * Fix #3459 * Move start/stop lock to non-interruptable for readability purposes	2016-09-16 11:54:23 -07:00
Jonathan Wei	df766b2bbd	Add dimension handling interface for ingestion and segment creation (#3217 ) * Add dimension handling interface for ingestion and segment creation * update javadocs for DimensionHandler/DimensionIndexer * Move IndexIO row validation into DimensionHandler * Fix null column skipping in mergerV9 * Add deprecation note for 'numeric_dims' filename pattern in IndexIO v8->v9 conversion * Fix java7 test failure	2016-09-12 12:54:02 -07:00
Gian Merlino	d108461838	groupBy v2: Parallel disk spilling. (#3433 ) In ConcurrentGrouper, when it becomes clear that disk spilling is necessary, switch from hash-based partitioning to thread-based partitioning. This stops processing threads from blocking each other while spilling is occurring.	2016-09-09 16:49:58 -06:00
Gian Merlino	1e3f94237e	groupBy v2: Configurable load factor. (#3437 ) Also change defaults: - bufferGrouperMaxLoadFactor from 0.75 to 0.7. - maxMergingDictionarySize to 100MB from 25MB, should be more appropriate for most heaps.	2016-09-07 14:14:59 -05:00
Roman Leventov	4f0bcdce36	Eager file unmapping in IndexIO, IndexMerger and IndexMergerV9 (#3422 ) * Eager file unmapping in IndexIO, IndexMerger and IndexMergerV9. The exact purpose for this change is to allow running IndexMergeBenchmark in Windows, however should also be universally 'better' than non-deterministic unmapping, done when MappedByteBuffers are garbage-collected (BACKEND-312) * Use Closer with a proper pattern in IndexIO, IndexMerger and IndexMergerV9 * Unmap file in IndexMergerV9.makeInvertedIndexes() using try-with-resources * Reformat IndexIO	2016-09-07 10:43:47 -07:00
Gian Merlino	8d2ae144a8	groupBy: Short-circuit identity preCompute manipulators. (#3434 )	2016-09-06 22:28:44 -06:00
Gian Merlino	1d07964987	LimitedTemporaryStorage: Fix perf bug. (#3432 ) FilterOutputStream has an inefficient implementation of write(byte[], int, int). So let's extend OutputStream directly and use efficient implementations of all methods.	2016-09-06 15:39:36 -07:00
Gian Merlino	8ed1894488	groupBy: Omit timestamp from merge key when granularity = all. (#3416 ) Fixes #3412.	2016-09-01 09:02:54 -07:00
Gian Merlino	6d25c5e053	Avoid materializing all groupBy results with order + limit. (#3410 ) The old TopNFunction code did Sequences.toList on the input sequence before using a priority queue to find the top N items. Now, the priority queue is used in an accumulator, so there is no need to fully materialize the results. Also removed equals/hashCode from the limitFn and remove limitFn from the GroupByQuery's hashCode, since that wasn't necessary and the implementation of hashCode wasn't correct anyway.	2016-08-31 14:08:07 -07:00
Gian Merlino	1268e2902c	Add groupBy test for multiple multi-value dimensions. (#3415 )	2016-08-31 11:21:10 -07:00
Gian Merlino	e9050c2b4c	TimeFormatExtractionFn: Allow null formats (equivalent to ISO8601) and granular bucketing. (#3411 )	2016-08-31 20:58:53 +05:30
Keuntae Park	0076b5fc1a	Interval bug fix for search query (#2903 ) * support query granularity and interval for search query * skip unncessary bitmap calculation when query interval contains whole the data interval of the given segments. * use binary search to find start and end index for the given interval * fix based on comment * bug fix based on the review comments and add unit tests	2016-08-31 20:52:44 +05:30
Dave Li	c4e8440c22	Adds long compression methods (#3148 ) * add read * update deprecated guava calls * add write and vsizeserde * add benchmark * separate encoding and compression * add header and reformat * update doc * address PR comment * fix buffer order * generate benchmark files * separate encoding strategy and format * fix benchmark * modify supplier write to channel * add float NONE handling * address PR comment * address PR comment 2	2016-08-30 16:17:46 -07:00
Jonathan Wei	4e91330a17	Use DimensionSpec in CardinalityAggregatorFactory (#3406 ) * Use DimensionSpec in CardinalityAggregatorFactory * Address PR comments * Fix requiredFields()	2016-08-30 15:54:02 -07:00
Gian Merlino	b11e9544ea	GroupBy v2: Improve hash code distribution. (#3407 ) Without this transformation, distribution of hash % X is poor in general. It is catastrophically poor when X is a multiple of 31 (many slots would be empty).	2016-08-30 12:09:08 +05:30
kaijianding	f037dfcaa4	fix missing segments duplicate retried (#3398 )	2016-08-29 23:46:21 +05:30
jaehong choi	2e0f253c32	introducing lists of existing columns in the fields of select queries' output (#2491 ) * introducing lists of existing columns in the fields of select queries' output * rebase master * address the comment. add test code for select query caching * change the cache code in SelectQueryQueryToolChest to 0x16	2016-08-25 21:37:53 +05:30
rajk-tetration	362b9266f8	Adding filters for TimeBoundary on backend (#3168 ) * Adding filters for TimeBoundary on backend Signed-off-by: Balachandar Kesavan <raj.ksvn@gmail.com> * updating TimeBoundaryQuery constructor in QueryHostFinderTest * add filter helpers * update filterSegments + test * Conditional filterSegment depending on whether a filter exists * Style changes * Trigger rebuild * Adding documentation for timeboundaryquery filtering * added filter serialization to timeboundaryquery cache * code style changes	2016-08-15 10:25:24 -07:00
Gian Merlino	e1b0b7de3e	IndexBuilder: Allow replacing rows, customizable maxRows. (#3359 )	2016-08-12 15:22:45 -07:00
Jonathan Wei	454587857c	Make StringComparator deserialization case-insensitive (#3356 )	2016-08-11 18:00:11 -07:00
Himanshu	043562914d	Update IncrementalIndex.getMetricType() to return type name stored by ComplexMetricsSerde instead of AggregatorFactory.getTypeName() (#3341 )	2016-08-10 11:03:44 -07:00
Gian Merlino	1eb7a7e882	Restore optimizations in BoundFilter. (#3343 )	2016-08-10 08:53:17 -07:00
Gian Merlino	a2bcd97512	IncrementalIndex: Fix multi-value dimensions returned from iterators. (#3344 ) They had arrays as values, which MapBasedRow doesn't understand and toStrings rather than converting to lists.	2016-08-10 08:47:29 -07:00
Jonathan Wei	890e3bdd3f	More informative query unit test names (#3342 )	2016-08-09 22:24:48 -07:00
Gian Merlino	8899affe48	Introduce standardized "Resource limit exceeded" error. (#3338 ) Fixes #3336.	2016-08-09 10:50:56 -07:00
Gian Merlino	21bce96c4c	More useful query errors. (#3335 ) Follow-up to #1773, which meant to add more useful query errors but did not actually do so. Since that patch, any error other than interrupt/cancel/timeout was reported as `{"error":"Unknown exception"}`. With this patch, the error fields are: - error, one of the specific strings "Query interrupted", "Query timeout", "Query cancelled", or "Unknown exception" (same behavior as before). - errorMessage, the message of the topmost non-QueryInterruptedException in the causality chain. - errorClass, the class of the topmost non-QueryInterruptedException in the causality chain. - host, the host that failed the query.	2016-08-09 16:14:52 +08:00
Gian Merlino	1aae5bd67d	Nicer handling for cancelled groupBy v2 queries. (#3330 ) 1. Wrap temporaryStorage in a resource holder, to avoid spurious "Closed" errors from already-running processing tasks. 2. Exit early from the merging accumulator if the query is cancelled.	2016-08-05 14:48:06 -07:00
Jonathan Wei	decefb7477	Add time interval dim filter and retention analysis example (#3315 ) * Add time interval dim filter and retention analysis example * Use closed-open matching for intervals, update cache key generation * Fix time filtering tests for interval boundary change	2016-08-05 07:25:04 -07:00
Navis Ryu	5b3f0ccb1f	Support variance and standard deviation (#2525 ) * Support variance and standard deviation * addressed comments	2016-08-04 17:32:58 -07:00
Gian Merlino	9437a7a313	HLL: Avoid some allocations when possible. (#3314 ) - HLLC.fold avoids duplicating the other buffer by saving and restoring its position. - HLLC.makeCollector(buffer) no longer duplicates incoming BBs. - Updated call sites where appropriate to duplicate BBs passed to HLLC.	2016-08-03 18:08:52 -07:00
Gian Merlino	a4b95af839	Fix grouper closing in GroupByMergingQueryRunnerV2. (#3316 ) The grouperHolder should be closed on failure, not the grouper.	2016-08-02 21:02:30 -07:00
Gian Merlino	0299ac73b8	Fix FilteredAggregators at ingestion time and in groupBy v2 nested queries. (#3312 ) The common theme between the two is they both create "fake" DimensionSelectors that work on top of Rows. They both do it because there isn't really any dictionary for the underlying Rows, they're just a stream of data. The fix for both is to allow a DimensionSelector to tell callers that it has no dictionary by returning CARDINALITY_UNKNOWN from getValueCardinality. The callers, in turn, can avoid using it in ways that assume it has a dictionary. Fixes #3311.	2016-08-02 17:39:40 -07:00
Gian Merlino	ae3e0015b6	Fix ClassCastException in nested v2 groupBys with timeouts. (#3310 ) Add tests for the CCE and for a bunch of other groupBy stuff. Also avoids setting the interrupted flag when InterruptedExceptions happen, since this might interfere with resource closing, no other query does it, and is probably pointless anyway since the thread is likely to be a jetty thread that we don't actually want to set an interrupt flag on. Also fixes toString on OrderByColumnSpec.	2016-08-02 16:02:44 -06:00
kaijianding	50d52a24fc	ability to not rollup at index time, make pre aggregation an option (#3020 ) * ability to not rollup at index time, make pre aggregation an option * rename getRowIndexForRollup to getPriorIndex * fix doc misspelling * test query using no-rollup indexes * fix benchmark fail due to jmh bug	2016-08-02 11:13:05 -07:00
Jonathan Wei	0bdaaa224b	Use Long.compare for NumericComparator when possible (#3309 )	2016-08-01 20:36:56 -07:00
Dave Li	bc20658239	groupBy nested query using v2 strategy (#3269 ) * changed v2 nested query strategy * add test for #3239 * update for new ValueMatcher interface and add benchmarks * enable time filtering * address PR comments * add failing test for outer filter aggregator * add helper class for sharing code * update nested groupby doc * move temporary storage instantiation * address PR comment * address PR comment 2	2016-08-01 18:30:39 -07:00
Jonathan Wei	a6105cbb86	Add numeric StringComparator (#3270 ) * Add numeric StringComparator * Only use direct long comparison for numeric ordering in BoundFilter, add time filtering benchmark query * Address PR comments, add multithreaded BoundDimFilter test * Add comment on strlen tie handling * Add timeseries interval filter benchmark * Adjust docs * Use jackson for StringComparator, address PR comments * Add new TopNMetricSpec and SearchSortSpec with tests (WIP) * More TopNMetricSpec and SearchSortSpec tests * Fix NewSearchSortSpec serde * Update docs for new DimensionTopNMetricSpec * Delete NumericDimensionTopNMetricSpec * Delete old SearchSortSpec * Rename NewSearchSortSpec to SearchSortSpec * Add TopN numeric comparator benchmark, address PR comments * Refactor OrderByColumnSpec * Add null checks to NumericComparator and String->BigDecimal conversion function * Add more OrderByColumnSpec serde tests	2016-07-29 15:44:16 -07:00
Navis Ryu	884017d981	"all" type search query spec (#3300 ) * "all" type search query spec * addressed comments * added unit test	2016-07-28 18:16:15 -07:00
Gian Merlino	2553997200	Associate groupBy v2 resources with the Sequence lifecycle. (#3296 ) This fixes a potential issue where groupBy resources could be allocated to create a Sequence, but then the Sequence is never used, and thus the resources are never freed. Also simplifies how groupBy handles config overrides (this made the new unit test easier to write).	2016-07-27 18:44:19 -07:00
Gian Merlino	9b5523add3	Reference counting, better error handling for resources in groupBy v2. (#3268 ) Refcounting prevents releasing the merge buffer, or closing the concurrent grouper, before the processing threads have all finished. The better error handling prevents an avalanche of per-runner exceptions when grouping resources are exhausted, by grouping those all up into a single merged exception.	2016-07-27 01:59:02 +05:30
Erik Dubbelboer	76fabcfdb2	Fix #2782 , Unit test failed for DruidProcessingConfigTest.testDeserialization (#3231 ) On systems with only once processor this test fails.	2016-07-25 15:51:09 -07:00
kaijianding	3dc2974894	Add timestampSpec to metadata.drd and SegmentMetadataQuery (#3227 ) * save TimestampSpec in metadata.drd * add timestampSpec info in SegmentMetadataQuery	2016-07-25 15:45:30 -07:00
Jonathan Wei	a42ccb6d19	Support filtering on long columns (including __time) (#3180 ) * Support filtering on __time column * Rename DruidPredicate * Add docs for ValueMatcherFactory, add comment on getColumnCapabilities * Combine ValueMatcherFactory predicate methods to accept DruidCompositePredicate * Address PR comments (support filter on all long columns) * Use predicate factory instead of composite predicate * Address PR comments * Lazily initialize long handling in selector/in filter * Move long value parsing from InFilter to InDimFilter, make long value parsing thread-safe * Add multithreaded selector/in filter test * Fix non-final lock object in SelectorDimFilter	2016-07-20 17:08:49 -07:00
Gian Merlino	06624c40c0	Share query handling between Appenderator and RealtimePlumber. (#3248 ) Fixes inconsistent metric handling between the two implementations. Formerly, RealtimePlumber only emitted query/segmentAndCache/time and query/wait and Appenderator only emitted query/partial/time and query/wait (all per sink). Now they both do the same thing: - query/segmentAndCache/time, query/segment/time are the time spent per sink. - query/cpu/time is the CPU time spent per query. - query/wait/time is the executor waiting time per sink. These generally match historical metrics, except segmentAndCache & segment mean the same thing here, because one Sink may be partially cached and partially uncached and we aren't splitting that out.	2016-07-19 22:15:13 -05:00
Nishant	7995818220	Increase test timeout to prevent failing on slow machines (#3224 ) constantly timing out on one of slow build machines, increasing the timeout fixed it. Running io.druid.granularity.QueryGranularityTest Tests run: 33, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.776 sec - in io.druid.granularity.QueryGranularityTest	2016-07-17 18:44:48 -07:00
Gian Merlino	6cd1f5375b	Better harmonized dimensions for query metrics. (#3245 ) All query metrics now start with toolChest.makeMetricBuilder, and all of those now start with DruidMetrics.makePartialQueryTimeMetric. Also, "id" moved to common code, since all query metrics added it anyway. In particular this will add query-type specific dimensions like "threshold" and "numDimensions" to servlet-originated metrics like query/time.	2016-07-14 11:55:51 -07:00
Gian Merlino	ea03906fcf	Configurable compressRunOnSerialization for Roaring bitmaps. (#3228 ) Defaults to true, which is a change in behavior (this used to be false and unconfigurable).	2016-07-08 10:24:19 +05:30
Gian Merlino	fdc7e88a7d	Allow queries with no aggregators. (#3216 ) This is actually reasonable for a groupBy or lexicographic topNs that is being used to do a "COUNT DISTINCT" kind of query. No aggregators are needed for that query, and including a dummy aggregator wastes 8 bytes per row. It's kind of silly for timeseries, but why not.	2016-07-06 20:38:54 +05:30
Jonathan Wei	f3a3662133	Fix compile error in SearchBinaryFnTest (#3201 )	2016-06-29 09:44:45 -05:00
jaehong choi	efbcbf5315	Support alphanumeric sort in search query (#2593 ) * support alphanumeric sort in search query * address a comment about handling equals() and hashCode() * address comments * add Ut for string comparators * address a comment about space indentations.	2016-06-28 15:06:18 -07:00
Hyukjin Kwon	45f553fc28	Replace the deprecated usage of NoneShardSpec (#3166 )	2016-06-25 10:27:25 -07:00
Gian Merlino	4cc39b2ee7	Alternative groupBy strategy. (#2998 ) This patch introduces a GroupByStrategy concept and two strategies: "v1" is the current groupBy strategy and "v2" is a new one. It also introduces a merge buffers concept in DruidProcessingModule, to try to better manage memory used for merging. Both of these are described in more detail in #2987. There are two goals of this patch: 1. Make it possible for historical/realtime nodes to return larger groupBy result sets, faster, with better memory management. 2. Make it possible for brokers to merge streams when there are no order-by columns, avoiding materialization. This patch does not do anything to help with memory management on the broker when there are order-by columns or when there are nested queries. That could potentially be done in a future patch.	2016-06-24 18:06:09 -07:00
Dave Li	8a08398977	Add segment pruning based on secondary partition dimension (#2982 ) * add get dimension rangeset to filters * add get domain to ShardSpec and added chunk filter in caching clustered client * add null check and modified not filter, started with unit test * add filter test with caching * refactor and some comments * extract filtershard to helper function * fixup * minor changes * update javadoc	2016-06-24 14:52:19 -07:00
michaelschiff	66d8ad36d7	adds new coordinator metrics 'segment/unavailable/count' and (#3176 ) 'segment/underReplicated/count' (#3173)	2016-06-23 14:53:15 -07:00
Gian Merlino	da660bb592	DumpSegment tool. (#3182 ) Fixes #2723.	2016-06-23 14:37:50 -07:00
Gian Merlino	a437fb150b	Fix SegmentMetadataQuery when queryGranularity is requested but not present. (#3181 )	2016-06-23 14:30:50 -07:00
Jonathan Wei	24860a1391	Two-stage filtering (#3018 ) * Two-stage filtering * PR comment	2016-06-22 16:08:21 -07:00
Nishant	f46ad9a4cb	support Union Segment metadata queries (#3132 ) * support Union Segment metadata queries fix 3128 * remove extraneous sys out	2016-06-21 10:30:50 -07:00
Dave Li	12be1c0a4b	Add bucket extraction function (#3033 ) * add bucket extraction function * add doc and header * updated doc and test	2016-06-17 09:24:27 -07:00
Gian Merlino	ebf890fe79	Update master version to 0.9.2-SNAPSHOT. (#3133 )	2016-06-13 13:10:38 -07:00
Nishant	0d427923c0	fix caching for search results (#3119 ) * fix caching for search results properly read count when reading from cache. * fix NPE during merging search count and add test * Update cache key to invalidate prev results	2016-06-09 17:49:47 -07:00
Gian Merlino	5998de7d5b	Fix lenient merging of conflicting aggregators. (#3113 ) This should have marked the conflicting aggregator as null, but instead it threw an NPE for the entire query.	2016-06-08 15:56:48 -07:00
Jonathan Wei	37c8a8f186	Speed up filter tests with adapter cache (#3103 )	2016-06-08 07:41:10 -07:00
Gian Merlino	54139c6815	Fix NPE in registeredLookup extractionFn when "optimize" is not provided. (#3064 )	2016-06-03 12:58:17 -05:00
Gian Merlino	6171e078c8	Improve NPE message in LookupDimensionSpec when lookup does not exist. (#3065 ) The message used to be empty, which made things hard to debug.	2016-06-02 19:59:12 -07:00
John Wang	e662efa79f	segment interface refactor for proposal 2965 (#2990 )	2016-05-26 20:36:41 -07:00
Kurt Young	b5bd406597	fix #2991 : race condition in OnheapIncrementalIndex#addToFacts (#3002 ) * fix #2991: race condition in OnheapIncrementalIndex#addToFacts * add missing header * handle parseExceptions when first doing first agg	2016-05-25 19:05:46 -07:00
Jonathan Wei	b72c54c4f8	Add benchmark data generator, basic ingestion/persist/merge/query benchmarks (#2875 )	2016-05-25 16:39:37 -07:00
Dave Li	dcabd4b1ee	Add lookup optimization for InDimFilter (#2938 ) * Add lookup optimization for InDimFilter * tests for in filter with lookup extraction fn * refactor * refactor2 and modified filter test * make optimizeLookup private	2016-05-19 16:29:16 -07:00
Charles Allen	15ccf451f9	Move QueryGranularity static fields to QueryGranularities (#2980 ) * Move QueryGranularity static fields to QueryGranularityUtil * Fixes #2979 * Add test showing #2979 * change name to QueryGranularities	2016-05-17 16:23:48 -07:00
Charles Allen	fb01db4db7	[QTL] Allows RegisteredLookupExtractionFn to find its lookups lazily (#2971 ) * Allows RegisteredLookupExtractionFn to find its lookups lazily * Use raw variables instead of AtomicReference * Make sure to use volatile * Remove extra local variable. * Move from BAOS to ByteBuffer	2016-05-17 11:29:39 -07:00
Himanshu	d3e9c47a5f	use correct ObjectMapper in Index[IO/Merger] in AggregationTestHelper and minor fix in theta sketch SketchMergeAggregatorFactory.getMergingFactory(..) (#2943 )	2016-05-13 10:06:31 +05:30
Himanshu	d821144738	at historicals GpBy query mergeResults does not need merging as results are already merged by GroupByQueryRunnerFactory.mergeRunners(..) (#2962 )	2016-05-12 17:41:24 -07:00
Gian Merlino	01bebf432a	GroupByQuery: Multi-value dimension tests. (#2959 )	2016-05-12 11:31:50 -07:00
Charles Allen	a31348450f	Add toString for LookupConfig (#2935 ) * Helps with operations and getting where the snapshot dir is	2016-05-09 18:20:00 -07:00
Dave Li	79a54283d4	Optimize filter for timeseries, search, and select queries (#2931 ) * Optimize filter for timeseries, search, and select queries * exception at failed toolchest type check * took out query type check * java7 error fix and test improvement	2016-05-09 11:04:06 -07:00
Slim	8b570ab130	make it clear what LookupExtractorFactory start/stop methods return (#2925 )	2016-05-05 10:38:40 -07:00
David Lim	b489f63698	Supervisor for KafkaIndexTask (#2656 ) * supervisor for kafka indexing tasks * cr changes	2016-05-04 23:13:13 -07:00
Himanshu	8e2742b7e8	adding QueryGranularity to segment metadata and optionally expose same from segmentMetadata query (#2873 )	2016-05-03 11:31:10 -07:00
Gian Merlino	40e595c7a0	Remove types from TimeAndDims, they aren't needed. (#2865 )	2016-05-03 13:10:25 -05:00
binlijin	841be5c61f	periodically emit metric segment/scan/pending (#2854 )	2016-05-02 22:38:13 -07:00
Navis Ryu	2729fea84d	Fix parsing fail of segment id with datasource containing underscore (#2797 ) * Fix parsing fail of segment id with underscored datasource (Fix for #2786) * addressed comment * renamed and moved code into api. added log4 dependency for tests * addressed comments * fixed test fails	2016-05-02 22:37:28 -07:00
Gian Merlino	90ce03c66f	Fix integer overflow in SegmentMetadataQuery numRows. (#2890 )	2016-04-27 14:37:04 -07:00
Gian Merlino	6dc7688a29	TimeAndDims equals/hashCode implementation. (#2870 ) Adapted from #2692, thanks @navis for original implementation.	2016-04-22 08:45:20 +08:00
Himanshu	3cfd9c64c9	make singleThreaded groupBy query config overridable at query time (#2828 ) * make isSingleThreaded groupBy query processing overridable at query time * refactor code in GroupByMergedQueryRunner to make processing of single threaded and parallel merging of runners consistent	2016-04-21 17:12:58 -07:00
Slim	984a518c9f	Merge pull request #2734 from b-slim/LookupIntrospection2 [QTL][Lookup] adding introspection endpoint	2016-04-21 12:15:57 -05:00
Gian Merlino	c74391e54c	JavaScript: Ability to disable. (#2853 ) Fixes #2852.	2016-04-21 09:43:15 -05:00
Gian Merlino	7d3e55717d	Reduce cost of various toFilter calls. (#2860 ) These happen once per segment and so it's better if they don't do as much work.	2016-04-21 04:28:46 +08:00
Gian Merlino	59460b17cc	Add Filters.matchPredicate helper, use it where appropriate. (#2851 ) This approach simplifies code and is generally faster, due to skipping unnecessary dictionary lookups (see #2850).	2016-04-19 15:54:32 -07:00
Xavier Léauté	b2745befb7	remove obsolete comment (#2858 )	2016-04-19 13:06:58 -07:00
Jisoo Kim	7b65ca7889	refactor ClientQuerySegmentWalker (#2837 ) * refactor ClientQuerySegmentWalker * add header to FluentQueryRunnerBuilder * refactor QueryRunnerTestHelper	2016-04-18 14:00:47 -07:00
Gian Merlino	7c0b1dde3a	DimensionPredicateFilter: Skip unnecessary dictionary lookup. (#2850 )	2016-04-18 12:38:25 -07:00
Jonathan Wei	b534f7203c	Fix performance regression from #2753 in IndexMerger (#2841 )	2016-04-14 21:39:41 -07:00
Jonathan Wei	a26134575b	Fix NPE in TopNLexicographicResultBuilder.addEntry() (#2835 )	2016-04-13 17:27:16 -07:00
Fangjin Yang	abd951df1a	Document how to use roaring bitmaps (#2824 ) * Document how to use roaring bitmaps This fixes #2408. While not all indexSpec properties are explained, it does explain how roaring bitmaps can be turned on. * fix * fix * fix * fix	2016-04-12 19:28:02 -07:00
michaelschiff	db35dd7508	fix issue #2744 . Check for null before combining metrics (#2774 )	2016-04-12 14:46:31 -07:00
Nishant	1bf1dd03a0	Merge pull request #2812 from mrijke/fix-missing-equals-hashcode-filters Add missing equals/hashcode to JS, Regex and SearchQuery DimFilters	2016-04-12 12:00:23 +05:30
Charles Allen	21e406613c	Merge pull request #2809 from metamx/fix2694 Fix test for snapshot taker to better check for lookup perist failure	2016-04-11 14:52:47 -07:00
Maarten Rijke	de68d6b7c4	Add missing equals/hashcode to JS, Regex and SearchQuery DimFilters This commits adds missing equals() and hashcode() methods to the JavascriptDimFilter, RegexDimFilter and the SearchQueryDimFilter.	2016-04-11 12:16:24 +02:00
Nishant	bbb326decf	Merge pull request #2799 from b-slim/fix_snapshot MapLookupFactory need to be Ser/Desr ready.	2016-04-07 13:22:34 +05:30
Slim Bouguerra	bf1eafc4e1	remove all the mock lookupFactory	2016-04-06 15:37:52 -05:00
Slim Bouguerra	59eb2490a0	MapLookupFactory need to be Ser/Desr.	2016-04-06 15:02:18 -05:00
Charles Allen	f915a59138	Merge pull request #2691 from metamx/lookupExtrFn Add ExtractionFn to LookupExtractor bridge	2016-04-06 09:13:08 -07:00
jon-wei	051fd6c0eb	Remove extra println from InFilter	2016-04-05 14:55:49 -07:00
Fangjin Yang	289bb6f885	Merge pull request #2690 from jon-wei/filter_support Allow filters to use extraction functions	2016-04-05 15:40:15 -06:00
jon-wei	0e481d6f93	Allow filters to use extraction functions	2016-04-05 13:24:56 -07:00
Gian Merlino	e060a9f283	Additional ExtractionFn null-handling adjustments. Followup to comments on #2771.	2016-04-01 18:35:26 -07:00
Fangjin Yang	18b9ea62cf	Merge pull request #2771 from gianm/extractionfn-stuff Various ExtractionFn null handling fixes.	2016-04-01 16:35:46 -07:00
Gian Merlino	23d66e5ff9	Merge pull request #2765 from navis/invalid-encode-nullstring Null string is encoded as "null" in incremental index	2016-04-01 14:43:40 -07:00
Gian Merlino	b6e4d8b2c1	Various ExtractionFn null handling fixes. - JavaScriptExtractionFn shouldn't pass empty strings to its JS functions - Upper/LowerExtractionFn properly handles null Objects (DimExtractionFn's implementation works here) - MatchingDimExtractionFn properly returns nulls rather than empties - RegexDimExtractionFn properly attempts matching on nulls and empties - SearchQuerySpecDimExtractionFn properly returns nulls when passed empties	2016-04-01 14:34:47 -07:00
Fangjin Yang	eea7a47870	Merge pull request #2576 from navis/paging-from-next Add option for select query to get next page without modifying returned paging identifiers	2016-04-01 13:50:36 -07:00
Fangjin Yang	4eb5a2c4f1	Merge pull request #2715 from navis/stringformat-null-handling stringFormat extractionFn should be able to return null on null values (Fix for #2706)	2016-04-01 13:45:28 -07:00
Gian Merlino	23364a47fd	BaseFilterTest: Test optimized filters too.	2016-04-01 12:44:59 -07:00
navis.ryu	077522a46f	stringFormat extractionFn should be able to return null on null values (Fix for #2706 )	2016-04-01 13:40:56 +09:00
navis.ryu	f0e55f5d31	Null string is encoded as "null" in incremental index	2016-04-01 09:47:15 +09:00
navis.ryu	29bb00535b	Add option for select query to get next page without modifying returned paging identifiers	2016-04-01 09:03:03 +09:00
Gian Merlino	5f9240fcbc	Merge pull request #2577 from navis/native-in-filter Implement native in filter	2016-03-30 20:02:54 -07:00
Fangjin Yang	3d68da94fe	Merge pull request #2661 from navis/utf8-estimated-length Utility method for length estimation of utf8	2016-03-30 19:56:14 -07:00
navis.ryu	108535fd07	Implement native in filter (Fix for #2577 )	2016-03-31 10:10:57 +09:00
navis.ryu	e0cfd9ee19	Utility method for length estimation of utf8	2016-03-31 10:07:00 +09:00
jon-wei	5503bf1b38	Remove unnecessary type check in TimeAndDimsComp	2016-03-30 17:54:15 -07:00
Fangjin Yang	95733a362f	Merge pull request #2753 from gianm/null-filtering-multi-value-columns More consistent empty-set filtering behavior on multi-value columns.	2016-03-29 18:52:25 -07:00
Charles Allen	95d42cfd9e	Merge pull request #2758 from pjain1/fix_npe_in_filter handle null values in In Filter	2016-03-29 17:53:02 -07:00
Gian Merlino	1853f36e9f	More consistent empty-set filtering behavior on multi-value columns. The behavior is now that filters on "null" will match rows with no values. The behavior in the past was inconsistent; sometimes these filters would match and sometimes they wouldn't. Adds tests for this behavior to SelectorFilterTest and BoundFilterTest, for query-level filters and filtered aggregates. Fixes #2750.	2016-03-29 15:32:13 -07:00
Parag Jain	d892918a3d	handle null values in In Filter	2016-03-29 17:03:26 -05:00
Fangjin Yang	e023df2b92	Merge pull request #2754 from gianm/i-dont-get-it Remove error suppression code from IncrementalIndexAdapter.	2016-03-28 19:29:53 -07:00
Gian Merlino	c7ff0d698e	Remove error suppression code from IncrementalIndexAdapter.	2016-03-28 18:40:27 -07:00
fjy	c418a55638	cleanup distinct count agg	2016-03-28 17:29:41 -07:00
Fangjin Yang	9cb197adec	Merge pull request #2722 from himanshug/fix_hadoop_jar_upload config to explicitly specify classpath for hadoop container during hadoop ingestion	2016-03-28 14:49:03 -07:00
Charles Allen	4a98c4fbac	Fix LookupExtractionFn equals and hashCode	2016-03-28 13:14:43 -07:00
Charles Allen	0ee861d0da	Add ExtractionFn to LookupExtractor bridge	2016-03-28 13:14:43 -07:00
Fangjin Yang	7fe277e6da	Merge pull request #2727 from gianm/optimize-bound-filter BoundFilter optimizations, and related interface changes.	2016-03-26 18:59:05 -07:00
Fangjin Yang	0dae28b6af	Merge pull request #2729 from jon-wei/fix_hyperunique_comparator Fix HyperUniquesAggregatorFactory comparator	2016-03-26 15:39:35 -07:00
Gian Merlino	2970b49adc	BoundFilter optimizations, and related interface changes. BoundFilter: - For lexicographic bounds, use bitmapIndex.getIndex to find the start and end points, then union all bitmaps between those points. - For alphanumeric bounds, iterate through dimValues, and union all bitmaps for values matching the predicate. - Change behavior for nulls: it used to be that the BoundFilter would never match nulls, now it matches nulls if "" is allowed by the lower limit and not excluded by the upper limit. Interface changes: - BitmapIndex: add `int getIndex(value)` to make it possible to get the index for a value without retrieving the bitmap. - BitmapIndex: remove `ImmutableBitmap getBitmap(value)`, change callers to `getBitmap(getIndex(value))`. - BitmapIndexSelector: allow retrieving the underlying BitmapIndex through getBitmapIndex. - Clarified contract of indexOf in Indexed, GenericIndexed. Also added tests for SelectorFilter, NotFilter, and BoundFilter.	2016-03-25 14:11:48 -07:00
jon-wei	9afaa2b94a	Fix HyperUniquesAggregatorFactory comparator	2016-03-25 12:36:42 -07:00
Gian Merlino	4ac9e03161	Fix predicate-based ValueMatcher behavior for IncrementalIndex on missing columns. Missing columns should be treated the same as columns containing 100% nulls.	2016-03-25 10:23:59 -07:00
Himanshu Gupta	e78a469fb7	UTs for ExtensionsConfig	2016-03-25 10:51:28 -05:00
Himanshu Gupta	004b00bb96	config to explicitly specify classpath for hadoop container during hadoop ingestion	2016-03-25 10:51:28 -05:00
Nishant	0b03c9405f	Merge pull request #2614 from sirpkt/calendric_gran Support week, month, quarter, and year in query granularity	2016-03-24 16:21:01 -07:00
Himanshu	56343c6cdc	Merge pull request #2704 from navis/simple-optimize optimize single elemented and/or filter	2016-03-24 16:13:48 -05:00
Gian Merlino	713062053c	Filters: Add filter.toFilter method, use that instead of the instanceof chain in Filters. I believe that the instanceof chain in Filters exists because in the past, Filter and DimFilter were in different packages (DimFilter was in druid-client and Filter was in druid-processing). And since druid-client didn't depend on druid-processing, DimFilter couldn't have a toFilter method. But now it can.	2016-03-23 17:03:49 -07:00
Gian Merlino	dd86198902	All Filters should work with FilteredAggregators. This removes Filter.makeMatcher(ColumnSelectorFactory) and adds a ValueMatcherFactory implementation to FilteredAggregatorFactory so it can take advantage of existing makeMatcher(ValueMatcherFactory) implementations. This patch also removes the Bound-based method from ValueMatcherFactory. Its only user was the SpatialFilter, which could use the Predicate-based method. Fixes #2604.	2016-03-23 12:24:01 -07:00
binlijin	57d78d3293	clean tmp file when index merge fail	2016-03-23 10:55:12 +08:00
navis.ryu	91f6be4884	optimize single elemented and/or filter	2016-03-23 09:29:15 +09:00
Gian Merlino	ff25325f3b	Improved docs for multi-value dimensions. - Add central doc for multi-value dimensions, with some content from other docs. - Link to multi-value dimension doc from topN and groupBy docs. - Fixes a broken link from dimensionspecs.md, which was presciently already linking to this nonexistent doc. - Resolve inconsistent naming in docs & code (sometimes "multi-valued", sometimes "multi-value") in favor of "multi-value".	2016-03-22 14:40:55 -07:00
jon-wei	a59c9ee1b1	Support use of DimensionSchema class in DimensionsSpec	2016-03-21 13:12:04 -07:00
Keuntae Park	7f29f2ac3b	support week, month, quarter, year in query granularity	2016-03-21 17:41:53 +09:00
Charles Allen	5da9a280b6	Query Time Lookup - Dynamic Configuration	2016-03-18 09:45:05 -07:00
Gian Merlino	738dcd8cd9	Update version to 0.9.1-SNAPSHOT. Fixes #2462	2016-03-17 10:34:20 -07:00
Slim	cf342d8d3c	Merge pull request #2517 from b-slim/adding_lookup_snapshot_utility [QTL][Lookup] lookup module with the snapshot utility	2016-03-17 11:39:47 -05:00
Slim Bouguerra	0c86b29ef0	lookup module with the snapshot utility	2016-03-17 09:20:41 -05:00
Charles Allen	2ac8a22173	Merge pull request #2579 from metamx/closerIsCloser Make CloserRule use guava's Closer	2016-03-14 17:18:19 -07:00
Charles Allen	a64979463f	Make CloserRule use guava's Closer	2016-03-14 15:01:24 -07:00
Fangjin Yang	06813b510a	Merge pull request #2571 from himanshug/gp_by_avoid_sort avoid sort while doing groupBy merging when possible	2016-03-14 14:46:51 -07:00
Fangjin Yang	dbdbacaa18	Merge pull request #2260 from navis/cardinality-for-searchquery Support cardinality for search query	2016-03-14 13:24:40 -07:00
Slim	8cc3582e70	Merge pull request #2644 from metamx/optimize-timeboundary optimize timeboundary for min or max bound	2016-03-13 13:16:24 -05:00
navis.ryu	be341bf4e3	Support cardinality for search query (Fix for #2260 )	2016-03-12 09:51:01 +09:00
Xavier Léauté	6f0d6ef0e9	optimize timeboundary for min or max bound	2016-03-11 14:11:47 -08:00
Gian Merlino	8a11161b20	Plumbers: Move plumber.add out of try/catch for ParseException. The incremental indexes handle that now so it's not necessary. Also, add debug logging and more detailed exceptions to the incremental indexes for the case where there are parse exceptions during aggregation.	2016-03-10 16:39:26 -08:00
Himanshu Gupta	dc0214bddb	while GroupBy merging use unsorted facts in IncrementalIndex wherever possible	2016-03-10 16:11:48 -06:00
Himanshu Gupta	02dfd5cd80	update IncrementalIndex to support unsorted facts map that can be used in groupBy merging to improve performance	2016-03-10 16:11:48 -06:00
Xavier Léauté	90d7409e1a	Merge pull request #2611 from himanshug/gp_by_max_limit only allow lowering maxResults and maxIntermediateRows from groupBy query context	2016-03-10 13:44:13 -08:00
Gian Merlino	a2b1652787	Clarify parser docs. - Clarify what parseSpecs are used for. - Avro, Protobuf should use timeAndDims parseSpecs. - Hadoop jobs should use hadoopyString string parsers.	2016-03-10 08:45:04 -08:00
Fangjin Yang	68cffe1d91	Merge pull request #2615 from gianm/timeseries-skipEmptyBuckets-cache Fix caching of skipEmptyBuckets for TimeseriesQuery.	2016-03-09 18:45:59 -08:00
Gian Merlino	708bc674fa	Make specifying query context booleans more consistent. Before, some needed to be strings and some needed to be real booleans. Now they can all be either one.	2016-03-08 19:38:26 -08:00
Gian Merlino	40dad6dff4	Fix caching of skipEmptyBuckets for TimeseriesQuery.	2016-03-08 19:22:12 -08:00
Himanshu Gupta	ca5de3f583	only allow lowering maxResults and maxIntermediateRows from groupBy query context	2016-03-08 15:03:59 -06:00
Himanshu Gupta	099acb4966	allow groupBy max[Intermediate]Rows limit be overridable by context	2016-03-07 15:22:41 -06:00
Himanshu Gupta	c544ebf25e	reintroducing the safety check removed in commit-1d602be so that dim value ids are less than cardinality	2016-03-03 23:34:23 -06:00
Bingkun Guo	4a58462fc7	update querySegmentSpec when passing query to getQueryRunner After finding the FireChief for a specific partition, Druid will need to find the specific queryRunner for each segment being queried by passing the query to FireChief. Currently Druid is passing the original query that contains all the segments need to be queried, it's possible that fireChief.getQueryRunner(query) returns more than 1 queryRunner because query.getIntervals() is not specific to a single segment. In this patch, for each segment being queried, Druid will update the query with its corresponding SpecificSegmentSpec.	2016-03-02 16:44:56 -06:00
Nishant	31b502773a	Merge pull request #2480 from navis/pagingfail-over-segments Select query cannot span to next segment with paging	2016-03-01 11:42:41 +05:30
Fangjin Yang	e5c25725c0	Merge pull request #2562 from himanshug/fix_2556 with nested GpBy query outer query results need to be further merged	2016-02-29 12:17:33 -08:00
Himanshu Gupta	0722ced413	with GpBy query outer query results need to be further merged	2016-02-29 10:16:25 -06:00
navis.ryu	b1ff920831	Lazily initialize predicate for bound filter	2016-02-29 15:35:52 +09:00
navis.ryu	5f1e60324a	Added more complex test case with versioned segments	2016-02-29 14:48:24 +09:00
navis.ryu	2686bfa394	Select query cannot span to next segment with paging	2016-02-29 00:01:46 +09:00
Fangjin Yang	29d29ba98d	Merge pull request #2263 from jon-wei/flex_dims3 Allow IncrementalIndex to store Long/Float dimensions	2016-02-25 17:23:02 -08:00
jon-wei	c17ce02467	Allow IncrementalIndex to store Long/Float dimensions	2016-02-24 13:51:57 -08:00
jon-wei	fd3782522c	Rename 'replaceMissingValues...' parameters in RegexExtractionFn	2016-02-24 13:12:56 -08:00
Nishant	fb7eae34ed	Merge pull request #2249 from metamx/workerExpanded Use Worker instead of ZkWorker whenever possible	2016-02-24 13:23:22 +05:30
Charles Allen	ac13a5942a	Use Worker instead of ZkWorker whenver possible * Moves last run task state information to Worker * Makes WorkerTaskRunner a TaskRunner which has interfaces to help with getting information about a Worker	2016-02-23 15:02:03 -08:00
Gian Merlino	3534483433	Better handling of ParseExceptions. Two changes: - Allow IncrementalIndex to suppress ParseExceptions on "aggregate". - Add "reportParseExceptions" option to realtime tuning configs. By default this is "false". Behavior of the counters should now be: - processed: Number of rows indexed, including rows where some fields could be parsed and some could not. - thrownAway: Number of rows thrown away due to rejection policy. - unparseable: Number of rows thrown away due to being completely unparseable (no fields salvageable at all). If "reportParseExceptions" is true then "unparseable" will always be zero (because a parse error would cause an exception to be thrown). In addition, "processed" will only include fully parseable rows (because even partial parse failures will cause exceptions to be thrown). Fixes #2510.	2016-02-23 10:11:43 -08:00
Fangjin Yang	3bdd757024	Merge pull request #1773 from b-slim/log_details Adding downstream source when throwing QueryInterruptedException	2016-02-22 10:16:07 -08:00
Slim Bouguerra	77925cc061	adding downstream source of QueryInterruptedException	2016-02-20 13:05:14 -06:00
Fangjin Yang	8ee81947cd	Merge pull request #2494 from himanshug/fix_timeseries do not drop post-aggs in TimeseriesQueryToolChest.makePreComputeManipulatorFn	2016-02-20 10:37:32 -08:00
Gian Merlino	d25c46cb9f	Add comparator to HyperUniquesFinalizingPostAggregator. This makes it possible to do groupBys with clauses like "HAVING uniques > 10". Beforehand you couldn't do it with either an aggregator (because it returns an HLLV1 which the havingSpec can't understand) or a finalized postaggregator (because it didn't have a comparator). Now you can at least do it with a finalizing postaggregator. Trying it with the aggregator alone still doesn't work. Added some topN and groupBy tests verifying the comparator, and added an @Ignore test that should pass if havingSpecs are made work on the aggregator directly.	2016-02-19 08:36:08 -08:00
Himanshu Gupta	11b0117422	do not drop post-aggs in timeseries query tool chest makePreComputeManipulatorFn like other query types	2016-02-17 20:51:35 -06:00
Jaehong Choi	32b9d57b23	handle a failing UT in GroupByQueryRunnerTest after merging into the master	2016-02-16 16:56:57 +09:00
Jaehong Choi	b25bca85bc	Merge branch 'master' of https://github.com/druid-io/druid into support-alphanumeric-dimensional-sort-in-gropu-by	2016-02-16 16:42:05 +09:00
Jaehong Choi	e89afc901b	delete System.out.println() in test code	2016-02-16 15:26:37 +09:00
Navis Ryu	cd315627c9	Merge pull request #2393 from CHOIJAEHONG1/support-alphanumeric-dimensional-sort-in-gropu-by support alphanumeric sorting for dimensional columns in groupby (#2393)	2016-02-16 14:11:30 +09:00
Slim	16092eb5e2	Merge pull request #2464 from gianm/print-properties Make startup properties logging optional.	2016-02-14 15:11:35 -06:00
Gian Merlino	e0c049c0b0	Make startup properties logging optional. Off by default, but enabled in the example config files. See also #2452.	2016-02-12 14:12:16 -08:00
Himanshu Gupta	da5fcd0124	before facts get it , indexAndOffsets should already know about it	2016-02-12 13:32:06 -06:00
Jonathan Wei	d63eec65a1	Merge pull request #2208 from navis/metadataquery-minmax Support min/max values for metadata query	2016-02-11 17:28:07 -08:00
Jonathan Wei	e1b022eac9	Merge pull request #2349 from navis/dimensionspec-for-selectquery Support dimension spec for select query	2016-02-11 16:38:16 -08:00
navis.ryu	dd2375477a	Support min/max values for metadata query (#2208 )	2016-02-12 09:35:58 +09:00
Gian Merlino	2d037ef05e	Merge pull request #2453 from DreamLab/fix/topn_sorting_anomaly Fix for unstable behavior of HyperLogLog comparator	2016-02-11 16:05:34 -08:00
navis.ryu	4d63196535	Support dimension spec for select query	2016-02-12 08:54:28 +09:00
Himanshu	47d48e1e67	Merge pull request #2452 from gianm/print-properties PropertiesModule: Print properties, processors, totalMemory on startup.	2016-02-11 16:49:34 -06:00
turu	f277a54a5c	removed unsafe heuristics from hll compareTo and provided unit test for regression	2016-02-11 23:46:24 +01:00
Slim	368988d187	Merge pull request #2291 from druid-io/lookupManager Promoting LookupExtractor state and LookupExtractorFactory to be a first class druid state object.	2016-02-11 16:07:27 -06:00
Gian Merlino	29f7758e74	PropertiesModule: Print properties, processors, totalMemory on startup.	2016-02-11 13:51:08 -08:00
Slim Bouguerra	4e119b7a24	Adding lookup ref manager and lookup dimension spec impl	2016-02-11 12:11:51 -06:00
Jaehong Choi	2f2e2ff5b9	support alphanumeric sorting for dimensional columns in groupby	2016-02-11 17:31:28 +09:00
Keuntae Park	05a144e39a	fix crash with filtered aggregator at ingestion time - only for selector filter because extraction filter is not supported as cardinality is not fixed at ingestion time	2016-02-11 11:25:33 +09:00
Fangjin Yang	b1673ee90e	Merge pull request #2409 from gianm/smq-merged-thing SegmentMetadataQuery: Retain segment id when merging, if possible.	2016-02-08 15:43:39 -08:00
Fangjin Yang	c9c20bb7f3	Merge pull request #2395 from metamx/fixExtractionDimFilterNullTest Actually check cache key null checking in ExtractionDimFilterTest	2016-02-08 14:10:52 -08:00
Gian Merlino	bd9c04244f	SegmentMetadataQuery: Retain segment id when merging, if possible. This is helpful on realtime nodes, where two analyses from two different hydrants are merged together but they are actually from the same segment.	2016-02-08 13:07:02 -08:00
Himanshu Gupta	9fe1b28ee5	provide configuration to enable usage of Off heap merging for groupBy query	2016-02-05 14:18:06 -06:00
Himanshu Gupta	b40c342cd1	make Global stupid pool cache size configurable	2016-02-05 14:18:06 -06:00
Himanshu Gupta	72a1e730a2	OffheapIncrementalIndex updates to do the aggregation merging off-heap	2016-02-05 14:17:05 -06:00
Himanshu Gupta	907dd77483	OffheapIncrementalIndex a copy/paste of OnheapIncrementalIndex	2016-02-05 14:02:31 -06:00
Charles Allen	aac5f9b2c9	Actually check cache key null checking in ExtractionDimFilterTest	2016-02-04 09:44:13 -08:00
fjy	1aa363cea7	new quickstart	2016-02-04 09:37:38 -08:00
Fangjin Yang	da77591129	Merge pull request #2392 from metamx/fix2391 Allow ExtractionDimFilter value to be null	2016-02-03 17:47:14 -08:00
Charles Allen	d4f00096ff	Allow ExtractionDimFilter value to be null * Fixes #2391	2016-02-03 15:51:47 -08:00
Himanshu Gupta	6e7d90cf56	UTs for DefaultLimitSpec	2016-02-03 15:59:12 -06:00
Himanshu Gupta	29e0d7f971	lazily create comparators for row columns when needed	2016-02-03 13:38:20 -06:00
navis.ryu	1d602be0f9	Replace string[] with int[] for dimensions	2016-02-03 15:03:22 +09:00
binlijin	a5ef30ff84	optimize topn on particular situation	2016-02-02 14:20:09 +08:00
Himanshu	93c50d8538	Merge pull request #2094 from navis/simplify-index-merge Simplifying dimension merging	2016-01-29 11:23:14 -06:00
navis.ryu	55a888ea2f	time-descending result of select queries	2016-01-29 10:06:05 +09:00
navis.ryu	dd774ef4dd	one-pass merging of dictionary & index	2016-01-29 10:03:53 +09:00
Himanshu	edd7ce58aa	Merge pull request #2348 from AlexanderSaydakov/fix-aggregator-test-helper fixed createIndex	2016-01-28 16:01:36 -06:00
saydakov	e0860661b1	fixed createIndex	2016-01-28 13:20:50 -08:00
Nishant	99017f4518	Merge pull request #2326 from navis/use-reverse-iterator use reverse-iterator if possible	2016-01-28 19:48:38 +05:30
Nishant	3880f54b87	Merge pull request #2332 from himanshug/configurable_partial make populateUncoveredIntervals a configuration in query context	2016-01-28 10:34:35 +05:30
navis.ryu	7324ece8f9	use reverse-iterator if possible	2016-01-28 09:04:55 +09:00
Xavier Léauté	5a3642bb93	Merge pull request #2247 from metamx/pedanticBuild Enable strict building in travis	2016-01-27 10:27:03 -08:00
Xavier Léauté	2e5004095a	Merge pull request #2341 from gianm/smq-test SegmentMetadataQuery: Fix merging of ColumnAnalysis errors.	2016-01-27 09:37:06 -08:00
Charles Allen	508734c8b0	Long constant reformatting in tests `l` --> `L`	2016-01-27 08:59:19 -08:00
Gian Merlino	b1e6c01762	Make LookupExtractor abstract methods public, they have to work across classloaders.	2016-01-26 23:08:03 -08:00
Gian Merlino	795343f7ef	SegmentMetadataQuery: Fix merging of ColumnAnalysis errors. Also add tests for: - ColumnAnalysis folding - Mixed mmap/incremental merging	2016-01-26 17:16:26 -08:00
Himanshu Gupta	3719b6e3c8	make populateUncoveredIntervals a configuration in query context	2016-01-26 15:13:45 -06:00
Himanshu	3844658fb5	Merge pull request #2323 from druid-io/update-druidapi Update druid-api to 0.3.16	2016-01-26 13:02:10 -06:00
Himanshu Gupta	09d3678667	adding single threaded indexing and querying test for IncrementalIndex	2016-01-23 00:17:14 -06:00
Charles Allen	0000b9fc62	Remove sorting in ProtoBufInputRowParserTest Due to processing/src/test/java/io/druid/data/input/ProtoBufInputRowParserTest.java	2016-01-22 16:02:25 -08:00
Himanshu Gupta	2f7f5119cf	older segments might not have field bitmapSerdeFactory for dimension columns and we must use appropriate default	2016-01-22 13:28:25 -06:00
binlijin	1d1f4d996d	Merge pull request #2111 from binlijin/optimize-create-inverted-indexes optimize create inverted indexes	2016-01-22 11:36:27 +08:00
binlijin	55f7dd4629	optimize create inverted indexes	2016-01-22 10:40:09 +08:00
Gian Merlino	d416279c14	SegmentMetadataQuery support for returning aggregators.	2016-01-21 17:27:25 -08:00
Fangjin Yang	5a9cd89059	Merge pull request #2305 from gianm/segment-metadata-query-multivalues Add StorageAdapter#getColumnTypeName, and various SegmentMetadataQuery adjustments	2016-01-21 17:22:34 -08:00
Gian Merlino	e5913be90e	Merge pull request #2257 from tubemogul/index-merge-bug Adds support for empty merge metrics. fixes #2256	2016-01-21 16:38:00 -08:00
Gian Merlino	87c8046c6c	Add StorageAdapter#getColumnTypeName, and various SegmentMetadataQuery adjustments. SegmentMetadataQuery stuff: - Simplify implementation of SegmentAnalyzer. - Fix type names for realtime complex columns; this used to try to merge a nice type name (like "hyperUnique") from mmapped segments with the word "COMPLEX" from incremental index segments, leading to a merge failure. Now it always uses the nice name. - Add hasMultipleValues to ColumnAnalysis. - Add tests for both mmapped and incremental index segments. - Update docs to include errorMessage.	2016-01-21 15:50:33 -08:00

... 11 12 13 14 15 ...

2576 Commits