druid

Commit Graph

Author	SHA1	Message	Date
Jonathan Wei	5fb1638534	Add default configuration for select query 'fromNext' parameter (#3986 ) * Add default configuration for select query 'fromNext' parameter * PR comments * Fix PagingSpec config injection * Injection fix for test	2017-03-01 17:05:35 -08:00
Gian Merlino	cc20133e70	Checkstyle rule to outlaw tabs. (#3988 ) Tabs are the worst.	2017-02-28 23:52:53 -08:00
Jonathan Wei	a08660a9ca	Support ingestion of long/float dimensions (#3966 ) * Support ingestion for long/float dimensions * Allow non-arrays for key components in indexing type strategy interfaces * Add numeric index merge test, fixes * Docs for numeric dims at ingestion * Remove unused import * Adjust docs, add aggregate on numeric dims tests * remove unused imports * Throw exception for bitmap method on numerics * Move typed selector creation to DimensionIndexer interface * unused imports * Fix * Remove unused DimensionSpec from indexer methods, check for dims first in inc index storage adapter * Remove spaces	2017-02-28 19:04:41 -08:00
praveev	5ccfdcc48b	Fix testDeadlock timeout delay (#3979 ) * No more singleton. Reduce iterations * Granularities * Fix the delay in the test * Add license header * Remove unused imports * Lot more unused imports from all the rearranging * CR feedback * Move javadoc to constructor	2017-02-28 12:51:41 -06:00
praveev	c3bf40108d	One granularity (#3850 ) * Refactor Segment Granularity * Beginning of one granularity * Copy the fix for custom periods in segment-grunalrity over here. * Remove the custom serialization for now. * Compilation cleanup * Reformat code * Fixing unit tests * Unify to use a single iterable * Backward compatibility for rolling upgrade * Minor check style. Cosmetic changes. * Rename length and millis to duration * CR feedback * Minor changes.	2017-02-25 01:02:29 -06:00
Jonathan Wei	58b704c3b4	Don't allow '__time' as a GroupBy output field name (#3967 ) * Don't allow '__time' as a GroupBy column field name * Tweak exception message	2017-02-23 14:39:17 -08:00
kaijianding	7ce05d58bc	fix NPE in search query when dimension contains null value (#3968 ) * fix NPE when dimension contains null value in search query * add ut * search with not existed dimension should always return empty result	2017-02-23 08:07:59 -08:00
Gian Merlino	372b84991c	Add virtual columns to timeseries, topN, and groupBy. (#3941 ) * Add virtual columns to timeseries, topN, and groupBy. * Fix GroupByTimeseriesQueryRunnerTest. * Updates from review comments.	2017-02-22 13:16:48 -08:00
Jihoon Son	7200dce112	Atomic merge buffer acquisition for groupBys (#3939 ) * Atomic merge buffer acquisition for groupBys * documentation * documentation * address comments * address comments * fix test failure * Addressed comments - Add InsufficientResourcesException - Renamed GroupByQueryBrokerResource to GroupByQueryResource * addressed comments * Add takeBatch() to BlockingPool	2017-02-22 14:49:37 -06:00
Gian Merlino	985203b634	Finalize fields in postaggs (#3957 ) * initial commits for finalizeFieldAccess #2433 * fix some bugs to run a query * change name of method Queries.verifyAggregations to Queries.prepareAggregations * add Uts * fix Ut failures * rebased to master * address comments and add a Ut for arithmetic post aggregators * rebased to the master * address the comment of injection within arithmetic post aggregator * address comments and introduce decorate() in the PostAggregator interface. * Address comments. 1. Implements getComparator in FinalizingFieldAccessPostAggregator and add Uts for it 2. Some minor changes like renaming a method name. * Fix a code style mismatch. * Rebased to the master	2017-02-21 16:32:14 -08:00
Gian Merlino	a47206eaf8	Ability to filter on virtual columns. (#3942 ) This didn't need much other than having BitmapIndexSelector return null from various methods to trigger cursor based filtering.	2017-02-21 16:03:31 -08:00
Jonathan Wei	bc33b68b51	Use GroupBy V2 as default (#3953 ) * Use GroupBy V2 as default * Remove unused line * Change assert to exception propagation	2017-02-18 07:40:40 -08:00
kaijianding	361d9d9802	fix dynamic schema data can't rollup correctly (#3949 ) * fix dynamic schema data can't rollup correctly * add ut	2017-02-17 15:07:29 -06:00
Akash Dwivedi	797488a677	Removing Integer.MAX column size limit. (#3743 ) * Removing Integer.MAX column size limit. * On demand creation of headerLong, use v2 instead of v3 * Avoid reusing the same object from a previous test. * Avoid reusing the same object from a previous test part#2 * code formatting. * GenericIndexed/Writer code review changes. * GenericIndexed/writer code review requested changes. * checkIndex() to static * native endianess for genericIndexedV2, code review requested changes. * Formatting * Hll fix. * use native endianess during bag size calculation. * Code review requested changes. * IOPeon close() changes. * use different tmp directory path for testing. * Code review requested changes.	2017-02-16 20:09:43 -06:00
Jihoon Son	a459db68b6	Fine grained buffer management for groupby (#3863 ) * Fine-grained buffer management for group by queries * Remove maxQueryCount from GroupByRules * Fix code style * Merge master * Fix compilation failure * Address comments * Address comments - Revert Sequence - Add isInitialized() to Grouper - Initialize the grouper in RowBasedGrouperHelper.Accumulator - Simple refactoring RowBasedGrouperHelper.Accumulator - Add tests for checking the number of used merge buffers - Improve docs * Revert unnecessary changes * change to visible to testing * fix misspelling	2017-02-14 12:55:54 -08:00
DaimonPl	a2875a4d91	pre-computed HLL support for hyperUnique aggregator (#3909 )	2017-02-13 15:26:20 -08:00
Akash Dwivedi	8854ce018e	File.deleteOnExit() (#3923 ) * Less use of File.deleteOnExit() * removed deleteOnExit from most of the tests/benchmarks/iopeon * Made IOpeon closable * Formatting. * Revert DeterminePartitionsJobTest, remove cleanup method from IOPeon	2017-02-13 15:12:14 -08:00
Himanshu	9dfcf0763a	disable javascript execution by default (#3818 )	2017-02-13 15:11:18 -08:00
Pierre	9ab9feced6	Close all aggregators when closing onHeapIncrementalIndex (#3926 ) * Close all aggregators when closing onHeapIncrementalIndex * Aggregators are now handled as Closeables, remove unnecessary mock in test * Fix variable shadowing	2017-02-13 15:01:27 -08:00
Jihoon Son	991e2852da	Add PostAggregators to generator cache keys for top-n queries (#3899 ) * Add PostAggregators to generator cache keys for top-n queries * Add tests for strings * Remove debug comments * Add type keys and list sizes to cache key * Make post aggregators used for sort are considered for cache key generation * Use assertArrayEquals() * Improve findPostAggregatorsForSort() * Address comments * fix test failure * address comments	2017-02-13 12:23:44 -08:00
Jonathan Wei	ca2b04f0fd	Add long/float ColumnSelectorStrategy implementations (#3838 ) * Add long/float ColumnSelectorStrategy implementations * Address PR comments * Add String strategy with internal dictionary to V2 groupby, remove dict from numeric wrapping selectors, more tests * PR comments * Use BaseSingleValueDimensionSelector for long/float wrapping * remove unused import * Address PR comments * PR comments * PR comments * More PR comments * Fix failing calcite histogram subquery tests * ScanQuery test and comment about isInputRaw * Add outputType to extractionDimensionSpec, tweak SQL tests * Fix limit spec optimization for numerics * Add cardinality sanity checks to TopN * Fix import from merge * Add tests for filtered dimension spec outputType * Address PR comments * Allow filtered dimspecs on numerics * More comments	2017-02-08 20:39:29 -08:00
Gian Merlino	97765fdfef	Simplify LikeFilter implementation of getBitmapIndex, estimateSelectivity. (#3910 ) * Simplify LikeFilter implementation of getBitmapIndex, estimateSelectivity. LikeFilter: - Reduce code duplication, and simplify methods, at the cost of incurring an extra box of ImmutableBitmap into a SingletonImmutableList. I think this is fine, since this should be cheap and the code path is not hot (just once per filter). Filters: - Make estimateSelectivity public since it seems intended that they be used by Filter implementations, and Filters from extensions may want to use them too. Removed @VisibleForTesting for the same reason. - Rename one of the estimatePredicateSelectivity overloads to estimateSelectivity, since predicates aren't involved. * Address PR comments. * Remove unused import * Change List to Collection	2017-02-08 13:46:01 -06:00
Jihoon Son	ddd8c9ef97	Add filter selectivity estimation for auto search strategy (#3848 ) * Add filter selectivity estimation for auto search strategy * Addressed comments * Lazy bitmap materialization for bitmap sampling and java docs * Addressed comments. - Fix wrong non-overlap ratio computation and added unit tests. - Change Iterable<Integer> to IntIterable - Remove unnecessary Iterable<Integer> * Addressed comments - Split a long ternary operation into if-else blocks - Add IntListUtils.fromTo() * Fix test failure and add a test for RangeIntList * fix code style * Diabled selectivity estimation for multi-valued dimensions * Address comment	2017-02-06 11:15:03 -08:00
Parag Jain	8a13a85765	Introduce SegmentizerFactory (#3901 ) * Introduce SegmentizerFactory - that knows how to deserialize specific type of segment - Default implementation is MMappedQueryableSegmentizerFactory which creates QueryableIndexSegment - Unit test for the default behavior * review comments	2017-02-06 10:05:12 -08:00
DaimonPl	93b71e265e	Extract HLL related code to separate module (#3900 )	2017-02-03 09:45:11 -08:00
Jonathan Wei	182261f713	Allow configurable temp directory for query processing (#3893 )	2017-02-02 10:22:28 -08:00
Jonathan Wei	e6b95e80aa	Remove deprecated Aggregator/AggregatorFactory methods (#3894 )	2017-02-01 14:43:18 -08:00
Gian Merlino	d3a3b7ba0c	Add virtual column types, holder serde, and safety features. (#3823 ) * Add virtual column types, holder serde, and safety features. Virtual columns: - add long, float, dimension selectors - put cache IDs in VirtualColumnCacheHelper - adjust serde so VirtualColumns can be the holder object for Jackson - add fail-fast validation for cycle detection and duplicates - add expression virtual column in core Storage adapters: - move virtual column hooks before checking base columns, to prevent surprises when a new base column is added that happens to have the same name as a virtual column. * Fix ExtractionDimensionSpecs with virtual dimensions. * Fix unused imports. * CR comments * Merge one more time, with feeling.	2017-01-26 18:15:51 -08:00
Roman Leventov	75d9e5e7a7	DimensionSelector-related bug fixes and optimizations (fixes #3799 , part of #3798 ) (#3858 ) * * Add DimensionSelector.idLookup() and nameLookupPossibleInAdvance() to allow better inspection of features DimensionSelectors supports, and safer code working with DimensionSelectors in BaseTopNAlgorithm, BaseFilteredDimensionSpec, DimensionSelectorUtils; * Add PredicateFilteringDimensionSelector, to make BaseFilteredDimensionSpec to be able to decorate DimensionSelectors with unknown cardinality; * Add DimensionSelector.makeValueMatcher() (two kinds) for DimensionSelector-side specifics-aware optimization of ValueMatchers; * Optimize getRow() in BaseFilteredDimensionSpec's DimensionSelector, StringDimensionIndexer's DimensionSelector and SingleScanTimeDimSelector; * Use two static singletons, TrueValueMatcher and FalseValueMatcher, instead of BooleanValueMatcher; * Add NullStringObjectColumnSelector singleton and use it in MapVirtualColumn * Rename DimensionSelectorUtils.makeNonDictionaryEncodedIndexedIntsBasedValueMatcher to makeNonDictionaryEncodedRowBasedValueMatcher * Make ArrayBasedIndexedInts constructor private, replace it's usages with of() static factory method * Cache baseIdLookup in ForwardingFilteredDimensionSelector * Fix a bug in DimensionSelectorUtils.makeRowBasedValueMatcher(selector, predicate, matchNull) * Employ precomputed BitSet optimization in DimensionSelector.makeValueMatcher(value, matchNull) when lookupId() is not available, but cardinality is known and lookupName() is available * Doc fixes * Addressed comments * Fix * Fix * Adjust javadoc of DimensionSelector.nameLookupPossibleInAdvance() for SingleScanTimeDimSelector * throw UnsupportedOperationException instead of IAE in BaseTopNAlgorithm	2017-01-25 15:28:27 -08:00
Slim	558dc365a4	renaming classes to be run by mvn and comment non operational tests (#3847 )	2017-01-17 11:59:12 -08:00
Gian Merlino	e86859b228	SQL support for nested groupBys. (#3806 ) * SQL support for nested groupBys. Allows, for example, doing exact count distinct by writing: SELECT COUNT() FROM (SELECT DISTINCT col FROM druid.foo) Contrast with approximate count distinct, which is: SELECT COUNT(DISTINCT col) FROM druid.foo Add deeply-nested groupBy docs, tests, and maxQueryCount config. * Extract magic constants into statics. * Rework rules to put preconditions in the "matches" method.	2017-01-11 18:32:53 -08:00
Jihoon Son	d80bec83cc	Enable auto license checking (#3836 ) * Enable license checking * Clean duplicated license headers	2017-01-10 18:13:47 -08:00
Jihoon Son	c099977a5b	Add an option to SearchQuery to choose a search query execution strategy (#3792 ) * Add an option to SearchQuery to choose a search query execution strategy. Supported strategies are 1) Index-only query execution 2) Cursor-based scan 3) Auto: choose an efficient strategy for a given query * Add SearchStrategy and SearchQueryExecutor * Address comments * Rename strategies and set UseIndexesStrategy as the default strategy * Add a cost-based planner for auto strategy * Add document * Fix code style * apply code style * apply comments	2017-01-10 18:04:20 -08:00
Gian Merlino	3c63cff57a	Remove makeMathExpressionSelector from ColumnSelectorFactory. (#3815 ) * Remove makeMathExpressionSelector from ColumnSelectorFactory. * Add @Nullable annotations in places, fix Number.class check. * Break up createBindings, add tests. * Add null check.	2017-01-05 18:06:38 -08:00
Gian Merlino	220ca7ebb6	Ignore DimFilterHavingSpec testConcurrentUsage. (#3814 )	2017-01-03 17:43:58 -07:00
Gian Merlino	d8702ebece	Filters: Use ColumnSelectorFactory directly for building row-based matchers. (#3797 ) * Filters: Use ColumnSelectorFactory directly for building row-based matchers. * Adjustments based on code review. - BoundDimFilter: fewer volatiles, rename matchesAnything to !matchesNothing. - HavingSpecs: Clarify that they are not thread-safe, and make DimFilterHavingSpec not thread safe. - Renamed rowType to rowSignature. - Added specializations for time-based vs non-time-based DimensionSelector in RBCSF. - Added convenience method DimensionHanderUtils.createColumnSelectorPlus. - Added singleton ZeroIndexedInts. - Added test cases for DimFilterHavingSpec. * Make ValueMatcherColumnSelectorStrategy actually use the associated selector. * Add RangeIndexedInts. * DimFilterHavingSpec: Fix concurrent usage guard on jdk7. * Add assertion to ZeroIndexedInts. * Rename no-longer-volatile members.	2017-01-03 14:30:22 -08:00
Roman Leventov	33800122ad	Don't return leaked Objects back to StupidPool, because this is dangerous. Reuse Cleaners in StupidPool. Make StupidPools named. Add StupidPool.leakedObjectCount(). Minor fixes (#3631 )	2016-12-26 00:35:35 -06:00
Jonathan Wei	0e5bd8b4d4	Add dimension type-based interface for query processing (#3570 ) * Add dimension type-based interface for query processing * PR comment changes * Address PR comments * Use getters for QueryDimensionInfo * Split DimensionQueryHelper into base interface and query-specific interfaces * Treat empty rows as nulls in v2 groupby * Reduce boxing in SearchQueryRunner * Add GroupBy empty row handling to MultiValuedDimensionTest * Address PR comments * PR comments and refactoring * More PR comments * PR comments	2016-12-21 20:11:37 -07:00
Jonathan Wei	2bfcc8a592	First and Last Aggregator (#3566 ) * add first and last aggregator * add test and fix * moving around * separate aggregator valueType * address PR comment * add finalize inner query and adjust v1 inner indexing * better test and fixes * java-util import fixes * PR comments * Add first/last aggs to ITWikipediaQueryTest	2016-12-16 15:26:40 -08:00
Himanshu	ed322a4beb	remove size from default analysisTypes list for segmentMetadata query (#3773 )	2016-12-13 18:01:21 -08:00
Erik Dubbelboer	bb9e35e1af	Add Greatest and Least post aggregations (#3567 )	2016-12-07 17:58:23 -08:00
Himanshu	45da7e48f1	groupBy sort results by (dimensions,timestamp) instead of (timestamp,dimension) (#3672 ) * sortByDimsFirst flag for groupBy query * Remove need for KeyType in Grouper<KeyType> to be Comparable<KeyType> * fix review comments * fix review comments regarding removing code duplication of dim/time comparison * move comparator for KeyType object to KeySerdeFactory so that creation of comparator does not need KeySerde * remove unnecessary system.out.println * make access static var NATURAL_NULLS_FIRST directly * further review comments addressing	2016-12-06 09:48:56 -08:00
Navis Ryu	c74d267f50	Support virtual column for select query (#2511 ) * Support virtual column for select query * Addressed comments	2016-12-05 15:14:35 -08:00
Gian Merlino	b64e06704e	Fix SingleScanTimeDimSelector when an extractionFn returns null for a timestamp. (#3732 )	2016-12-02 15:27:54 -08:00
Gian Merlino	f4cc8c2b2f	IndexBuilder: Close IncrementalIndex when done. (#3734 )	2016-12-02 16:56:34 -06:00
Gian Merlino	353fee79dd	Add "asMillis" option to "timeFormat" extractionFn. (#3733 ) This is useful for chaining extractionFns that all want to treat time as millis, such as having a javascript extractionFn after a timeFormat.	2016-12-02 13:45:16 -08:00
Gian Merlino	102375d9bb	Add "strlen" extractionFn. (#3731 )	2016-12-02 12:08:51 -08:00
Gian Merlino	4c5d10f8a3	Add DimFilterHavingSpec. (#3727 ) * Add DimFilterHavingSpec. * Add test for DimFilterHavingSpec with extractionFns.	2016-12-02 10:04:30 -08:00
Gian Merlino	477e0cab7c	Filter fixes and tests (#3724 ) * More robust Filter tests. All Filter tests now exercise the CNF and post-filtering features. * Fixes to RowBasedValueMatcherFactory and to bound filters. - Change Comparables to Strings in ValueMatcher related code. - Break out RowBasedValueMatcherFactory, fix a variety of issues around nulls, and add tests. - Fix bound filters on long columns with non-numeric bounds, and add tests.	2016-11-30 16:10:05 -08:00
Gian Merlino	6922d684bf	GroupBy: Validation of output names, and a gross hack for v1 subqueries. (#3686 ) v1 subqueries try to use aggregators to "transfer" values from the inner results to an incremental index, but aggregators can't transfer all kinds of values (strings are a common one). This is a workaround that selectively ignores what the outer aggregators ask for and instead assumes that we know best. These are in the same commit because the name validation changed the kinds of errors that were thrown by v1 subqueries.	2016-11-29 12:35:03 +05:30
Gian Merlino	7e80d1045a	Exercise v2 engine in the groupBy aggregator and multi-value dimension tests. (#3698 ) This also involved some other test changes: - Added a factory.mergeRunners step to AggregationTestHelper's groupBy chain, since the v2 engine does merging there. - Changed test byteBuffer pools from on-heap to off-heap to work around https://github.com/DataSketches/sketches-core/pull/116 for datasketches tests.	2016-11-16 20:02:25 -08:00
Gian Merlino	9ad34a3f03	groupBy v1: Force all dimensions to strings. (#3685 ) Fixes #3683.	2016-11-14 09:30:18 -08:00
Jisoo Kim	7c0f462fbc	fix bug in StringDimensionHandler and add a cli tool for validating segments (#3666 )	2016-11-11 18:46:25 -08:00
Akash Dwivedi	3e408497b3	Migrating bytebuffercollections from Metamarkets. (#3647 ) * Migrating bytebuffercollections from Metamarkets. * resolving code conflicts and removing <p> from bytebuffer-collections.	2016-11-11 10:51:07 -08:00
Gian Merlino	fd5451486c	Short-circuiting AndFilter. (#3676 ) If any of the bitmaps are empty, the result will be false.	2016-11-11 10:14:56 -08:00
Gian Merlino	600bbd4a17	BucketExtractionFn: Implement hashCode, fix toString. (#3656 )	2016-11-04 11:24:02 -07:00
Gian Merlino	2c504b6258	Add "like" filter. (#3642 ) * Add "like" filter. * Addressed some PR comments. * Slight simplifications to LikeFilter. * Additional simplifications. * Fix comment in LikeFilter. * Clarify comment in LikeFilter. * Simplify LikeMatcher a bit. * No use going through the optimized path if prefix is empty. * Add more tests.	2016-11-04 23:25:03 +05:30
Navis Ryu	b99e14e732	Support configuration for handling multi-valued dimension (#2541 ) * Support configuration for handling multi-valued dimension * Addressed comments * use MultiValueHandling.ofDefault() for missing policy	2016-11-03 22:38:54 -06:00
Navis Ryu	e10def32f2	Support string type in math expression (#2836 ) * Support string type in math expression addressed comments addressed comments Addressed comments * Updated math function document * Addressed comments	2016-11-02 21:10:48 -06:00
kaijianding	2961406b90	fix zero period in PeriodGranularity causing gran.iterable(start, end) infinite loop (#3644 )	2016-11-02 15:40:07 +05:30
Gian Merlino	45940d6e40	Math expressions support for missing columns. (#3630 ) Also add SchemaEvolutionTest to help test this kind of thing. Fixes #3627 and includes test for #3625.	2016-11-01 09:40:25 -07:00
Gian Merlino	89d9c61894	Deprecate Aggregator.getName and AggregatorFactory.getAggregatorStartValue. (#3572 )	2016-10-31 15:24:30 -07:00
Navis Ryu	3fca3be9ea	SpecificSegmentQueryRunner misses missing segments from toYielder() (#3617 )	2016-10-30 11:47:29 -07:00
Himanshu	23a8e22836	fix SketchMergeAggregatorFactory.finalizeResults, comparator and more UTs for timeseries, topN (#3613 )	2016-10-28 15:48:33 -07:00
Akash Dwivedi	4b3bd8bd63	Migrating java-util from Metamarkets. (#3585 ) * Migrating java-util from Metamarkets. * checkstyle and updated license on java-util files. * Removed unused imports from whole project. * cherry pick metamx/java-util@826021f. * Copyright changes on java-util pom, address review comments.	2016-10-21 14:57:07 -07:00
Navis Ryu	8b7ff4409a	Math expressional parameters for aggregator (#2783 ) * Supports expression-paramed aggregator (squashed and rebased on master) also includes math post aggregator (was #2820) * Addressed comments * addressed comments	2016-10-19 13:58:35 -05:00
Roman Leventov	5dc95389f7	Add Checkstyle framework (#3551 ) * Add Checkstyle framework * Avoid star import * Need braces for control flow statements * Redundant imports * Add NewLineAtEndOfFile check	2016-10-13 13:37:47 -07:00
Roman Leventov	85ac8eff90	Improve performance of IndexMergerV9 (#3440 ) * Improve performance of StringDimensionMergerV9 and StringDimensionMergerLegacy by avoiding primitive int boxing by using IntIterator in IndexedInts instead of Iterator<Integer>; Extract some common logic for V9 and Legacy mergers; Minor improvements to resource handling in StringDimensionMergerV9 * Don't mask index in MergeIntIterator.makeQueueElement() * DRY conversion RoaringBitmap's IntIterator to fastutil's IntIterator * Do implement skip(n) in IntIterators extending AbstractIntIterator because original implementation is not reliable * Use Test(expected=Exception.class) instead of try { } catch (Exception e) { /* ignore */ }	2016-10-13 08:28:46 -07:00
Charles Allen	654e1db309	Add simple test to FunctionalExtractionTest (#3522 )	2016-09-28 23:45:15 -07:00
Gian Merlino	d5a8a35fec	groupBy: GroupByRowProcessor fixes, invert subquery context overrides. (#3502 ) - Fix GroupByRowProcessor config overrides - Fix GroupByRowProcessor resource limit checking - Invert subquery context overrides such that for the subquery, its own keys override keys from the outer query, not the other way around. The last bit is necessary for the test to work, and seems like a better way to do it anyway.	2016-09-23 14:41:09 -07:00
Gian Merlino	f8d71fc602	groupBy: Fix maxMergingDictionarySize config. (#3488 )	2016-09-22 10:02:33 -07:00
Gian Merlino	c87ecea975	Fix ListFilteredDimensionSpec blacklisting on non-present values. (#3487 )	2016-09-22 09:12:02 -07:00
Jonathan Wei	df766b2bbd	Add dimension handling interface for ingestion and segment creation (#3217 ) * Add dimension handling interface for ingestion and segment creation * update javadocs for DimensionHandler/DimensionIndexer * Move IndexIO row validation into DimensionHandler * Fix null column skipping in mergerV9 * Add deprecation note for 'numeric_dims' filename pattern in IndexIO v8->v9 conversion * Fix java7 test failure	2016-09-12 12:54:02 -07:00
Gian Merlino	1e3f94237e	groupBy v2: Configurable load factor. (#3437 ) Also change defaults: - bufferGrouperMaxLoadFactor from 0.75 to 0.7. - maxMergingDictionarySize to 100MB from 25MB, should be more appropriate for most heaps.	2016-09-07 14:14:59 -05:00
Gian Merlino	6d25c5e053	Avoid materializing all groupBy results with order + limit. (#3410 ) The old TopNFunction code did Sequences.toList on the input sequence before using a priority queue to find the top N items. Now, the priority queue is used in an accumulator, so there is no need to fully materialize the results. Also removed equals/hashCode from the limitFn and remove limitFn from the GroupByQuery's hashCode, since that wasn't necessary and the implementation of hashCode wasn't correct anyway.	2016-08-31 14:08:07 -07:00
Gian Merlino	1268e2902c	Add groupBy test for multiple multi-value dimensions. (#3415 )	2016-08-31 11:21:10 -07:00
Gian Merlino	e9050c2b4c	TimeFormatExtractionFn: Allow null formats (equivalent to ISO8601) and granular bucketing. (#3411 )	2016-08-31 20:58:53 +05:30
Keuntae Park	0076b5fc1a	Interval bug fix for search query (#2903 ) * support query granularity and interval for search query * skip unncessary bitmap calculation when query interval contains whole the data interval of the given segments. * use binary search to find start and end index for the given interval * fix based on comment * bug fix based on the review comments and add unit tests	2016-08-31 20:52:44 +05:30
Dave Li	c4e8440c22	Adds long compression methods (#3148 ) * add read * update deprecated guava calls * add write and vsizeserde * add benchmark * separate encoding and compression * add header and reformat * update doc * address PR comment * fix buffer order * generate benchmark files * separate encoding strategy and format * fix benchmark * modify supplier write to channel * add float NONE handling * address PR comment * address PR comment 2	2016-08-30 16:17:46 -07:00
Jonathan Wei	4e91330a17	Use DimensionSpec in CardinalityAggregatorFactory (#3406 ) * Use DimensionSpec in CardinalityAggregatorFactory * Address PR comments * Fix requiredFields()	2016-08-30 15:54:02 -07:00
kaijianding	f037dfcaa4	fix missing segments duplicate retried (#3398 )	2016-08-29 23:46:21 +05:30
jaehong choi	2e0f253c32	introducing lists of existing columns in the fields of select queries' output (#2491 ) * introducing lists of existing columns in the fields of select queries' output * rebase master * address the comment. add test code for select query caching * change the cache code in SelectQueryQueryToolChest to 0x16	2016-08-25 21:37:53 +05:30
rajk-tetration	362b9266f8	Adding filters for TimeBoundary on backend (#3168 ) * Adding filters for TimeBoundary on backend Signed-off-by: Balachandar Kesavan <raj.ksvn@gmail.com> * updating TimeBoundaryQuery constructor in QueryHostFinderTest * add filter helpers * update filterSegments + test * Conditional filterSegment depending on whether a filter exists * Style changes * Trigger rebuild * Adding documentation for timeboundaryquery filtering * added filter serialization to timeboundaryquery cache * code style changes	2016-08-15 10:25:24 -07:00
Gian Merlino	e1b0b7de3e	IndexBuilder: Allow replacing rows, customizable maxRows. (#3359 )	2016-08-12 15:22:45 -07:00
Jonathan Wei	454587857c	Make StringComparator deserialization case-insensitive (#3356 )	2016-08-11 18:00:11 -07:00
Gian Merlino	a2bcd97512	IncrementalIndex: Fix multi-value dimensions returned from iterators. (#3344 ) They had arrays as values, which MapBasedRow doesn't understand and toStrings rather than converting to lists.	2016-08-10 08:47:29 -07:00
Jonathan Wei	890e3bdd3f	More informative query unit test names (#3342 )	2016-08-09 22:24:48 -07:00
Gian Merlino	8899affe48	Introduce standardized "Resource limit exceeded" error. (#3338 ) Fixes #3336.	2016-08-09 10:50:56 -07:00
Gian Merlino	21bce96c4c	More useful query errors. (#3335 ) Follow-up to #1773, which meant to add more useful query errors but did not actually do so. Since that patch, any error other than interrupt/cancel/timeout was reported as `{"error":"Unknown exception"}`. With this patch, the error fields are: - error, one of the specific strings "Query interrupted", "Query timeout", "Query cancelled", or "Unknown exception" (same behavior as before). - errorMessage, the message of the topmost non-QueryInterruptedException in the causality chain. - errorClass, the class of the topmost non-QueryInterruptedException in the causality chain. - host, the host that failed the query.	2016-08-09 16:14:52 +08:00
Jonathan Wei	decefb7477	Add time interval dim filter and retention analysis example (#3315 ) * Add time interval dim filter and retention analysis example * Use closed-open matching for intervals, update cache key generation * Fix time filtering tests for interval boundary change	2016-08-05 07:25:04 -07:00
Navis Ryu	5b3f0ccb1f	Support variance and standard deviation (#2525 ) * Support variance and standard deviation * addressed comments	2016-08-04 17:32:58 -07:00
Gian Merlino	9437a7a313	HLL: Avoid some allocations when possible. (#3314 ) - HLLC.fold avoids duplicating the other buffer by saving and restoring its position. - HLLC.makeCollector(buffer) no longer duplicates incoming BBs. - Updated call sites where appropriate to duplicate BBs passed to HLLC.	2016-08-03 18:08:52 -07:00
Gian Merlino	0299ac73b8	Fix FilteredAggregators at ingestion time and in groupBy v2 nested queries. (#3312 ) The common theme between the two is they both create "fake" DimensionSelectors that work on top of Rows. They both do it because there isn't really any dictionary for the underlying Rows, they're just a stream of data. The fix for both is to allow a DimensionSelector to tell callers that it has no dictionary by returning CARDINALITY_UNKNOWN from getValueCardinality. The callers, in turn, can avoid using it in ways that assume it has a dictionary. Fixes #3311.	2016-08-02 17:39:40 -07:00
Gian Merlino	ae3e0015b6	Fix ClassCastException in nested v2 groupBys with timeouts. (#3310 ) Add tests for the CCE and for a bunch of other groupBy stuff. Also avoids setting the interrupted flag when InterruptedExceptions happen, since this might interfere with resource closing, no other query does it, and is probably pointless anyway since the thread is likely to be a jetty thread that we don't actually want to set an interrupt flag on. Also fixes toString on OrderByColumnSpec.	2016-08-02 16:02:44 -06:00
kaijianding	50d52a24fc	ability to not rollup at index time, make pre aggregation an option (#3020 ) * ability to not rollup at index time, make pre aggregation an option * rename getRowIndexForRollup to getPriorIndex * fix doc misspelling * test query using no-rollup indexes * fix benchmark fail due to jmh bug	2016-08-02 11:13:05 -07:00
Dave Li	bc20658239	groupBy nested query using v2 strategy (#3269 ) * changed v2 nested query strategy * add test for #3239 * update for new ValueMatcher interface and add benchmarks * enable time filtering * address PR comments * add failing test for outer filter aggregator * add helper class for sharing code * update nested groupby doc * move temporary storage instantiation * address PR comment * address PR comment 2	2016-08-01 18:30:39 -07:00
Jonathan Wei	a6105cbb86	Add numeric StringComparator (#3270 ) * Add numeric StringComparator * Only use direct long comparison for numeric ordering in BoundFilter, add time filtering benchmark query * Address PR comments, add multithreaded BoundDimFilter test * Add comment on strlen tie handling * Add timeseries interval filter benchmark * Adjust docs * Use jackson for StringComparator, address PR comments * Add new TopNMetricSpec and SearchSortSpec with tests (WIP) * More TopNMetricSpec and SearchSortSpec tests * Fix NewSearchSortSpec serde * Update docs for new DimensionTopNMetricSpec * Delete NumericDimensionTopNMetricSpec * Delete old SearchSortSpec * Rename NewSearchSortSpec to SearchSortSpec * Add TopN numeric comparator benchmark, address PR comments * Refactor OrderByColumnSpec * Add null checks to NumericComparator and String->BigDecimal conversion function * Add more OrderByColumnSpec serde tests	2016-07-29 15:44:16 -07:00
Navis Ryu	884017d981	"all" type search query spec (#3300 ) * "all" type search query spec * addressed comments * added unit test	2016-07-28 18:16:15 -07:00
Gian Merlino	2553997200	Associate groupBy v2 resources with the Sequence lifecycle. (#3296 ) This fixes a potential issue where groupBy resources could be allocated to create a Sequence, but then the Sequence is never used, and thus the resources are never freed. Also simplifies how groupBy handles config overrides (this made the new unit test easier to write).	2016-07-27 18:44:19 -07:00
Erik Dubbelboer	76fabcfdb2	Fix #2782 , Unit test failed for DruidProcessingConfigTest.testDeserialization (#3231 ) On systems with only once processor this test fails.	2016-07-25 15:51:09 -07:00
kaijianding	3dc2974894	Add timestampSpec to metadata.drd and SegmentMetadataQuery (#3227 ) * save TimestampSpec in metadata.drd * add timestampSpec info in SegmentMetadataQuery	2016-07-25 15:45:30 -07:00
Jonathan Wei	a42ccb6d19	Support filtering on long columns (including __time) (#3180 ) * Support filtering on __time column * Rename DruidPredicate * Add docs for ValueMatcherFactory, add comment on getColumnCapabilities * Combine ValueMatcherFactory predicate methods to accept DruidCompositePredicate * Address PR comments (support filter on all long columns) * Use predicate factory instead of composite predicate * Address PR comments * Lazily initialize long handling in selector/in filter * Move long value parsing from InFilter to InDimFilter, make long value parsing thread-safe * Add multithreaded selector/in filter test * Fix non-final lock object in SelectorDimFilter	2016-07-20 17:08:49 -07:00
Nishant	7995818220	Increase test timeout to prevent failing on slow machines (#3224 ) constantly timing out on one of slow build machines, increasing the timeout fixed it. Running io.druid.granularity.QueryGranularityTest Tests run: 33, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.776 sec - in io.druid.granularity.QueryGranularityTest	2016-07-17 18:44:48 -07:00
Gian Merlino	ea03906fcf	Configurable compressRunOnSerialization for Roaring bitmaps. (#3228 ) Defaults to true, which is a change in behavior (this used to be false and unconfigurable).	2016-07-08 10:24:19 +05:30
Gian Merlino	fdc7e88a7d	Allow queries with no aggregators. (#3216 ) This is actually reasonable for a groupBy or lexicographic topNs that is being used to do a "COUNT DISTINCT" kind of query. No aggregators are needed for that query, and including a dummy aggregator wastes 8 bytes per row. It's kind of silly for timeseries, but why not.	2016-07-06 20:38:54 +05:30
Jonathan Wei	f3a3662133	Fix compile error in SearchBinaryFnTest (#3201 )	2016-06-29 09:44:45 -05:00
jaehong choi	efbcbf5315	Support alphanumeric sort in search query (#2593 ) * support alphanumeric sort in search query * address a comment about handling equals() and hashCode() * address comments * add Ut for string comparators * address a comment about space indentations.	2016-06-28 15:06:18 -07:00
Hyukjin Kwon	45f553fc28	Replace the deprecated usage of NoneShardSpec (#3166 )	2016-06-25 10:27:25 -07:00
Gian Merlino	4cc39b2ee7	Alternative groupBy strategy. (#2998 ) This patch introduces a GroupByStrategy concept and two strategies: "v1" is the current groupBy strategy and "v2" is a new one. It also introduces a merge buffers concept in DruidProcessingModule, to try to better manage memory used for merging. Both of these are described in more detail in #2987. There are two goals of this patch: 1. Make it possible for historical/realtime nodes to return larger groupBy result sets, faster, with better memory management. 2. Make it possible for brokers to merge streams when there are no order-by columns, avoiding materialization. This patch does not do anything to help with memory management on the broker when there are order-by columns or when there are nested queries. That could potentially be done in a future patch.	2016-06-24 18:06:09 -07:00
Dave Li	8a08398977	Add segment pruning based on secondary partition dimension (#2982 ) * add get dimension rangeset to filters * add get domain to ShardSpec and added chunk filter in caching clustered client * add null check and modified not filter, started with unit test * add filter test with caching * refactor and some comments * extract filtershard to helper function * fixup * minor changes * update javadoc	2016-06-24 14:52:19 -07:00
Jonathan Wei	24860a1391	Two-stage filtering (#3018 ) * Two-stage filtering * PR comment	2016-06-22 16:08:21 -07:00
Nishant	f46ad9a4cb	support Union Segment metadata queries (#3132 ) * support Union Segment metadata queries fix 3128 * remove extraneous sys out	2016-06-21 10:30:50 -07:00
Dave Li	12be1c0a4b	Add bucket extraction function (#3033 ) * add bucket extraction function * add doc and header * updated doc and test	2016-06-17 09:24:27 -07:00
Nishant	0d427923c0	fix caching for search results (#3119 ) * fix caching for search results properly read count when reading from cache. * fix NPE during merging search count and add test * Update cache key to invalidate prev results	2016-06-09 17:49:47 -07:00
Gian Merlino	5998de7d5b	Fix lenient merging of conflicting aggregators. (#3113 ) This should have marked the conflicting aggregator as null, but instead it threw an NPE for the entire query.	2016-06-08 15:56:48 -07:00
Jonathan Wei	37c8a8f186	Speed up filter tests with adapter cache (#3103 )	2016-06-08 07:41:10 -07:00
John Wang	e662efa79f	segment interface refactor for proposal 2965 (#2990 )	2016-05-26 20:36:41 -07:00
Kurt Young	b5bd406597	fix #2991 : race condition in OnheapIncrementalIndex#addToFacts (#3002 ) * fix #2991: race condition in OnheapIncrementalIndex#addToFacts * add missing header * handle parseExceptions when first doing first agg	2016-05-25 19:05:46 -07:00
Dave Li	dcabd4b1ee	Add lookup optimization for InDimFilter (#2938 ) * Add lookup optimization for InDimFilter * tests for in filter with lookup extraction fn * refactor * refactor2 and modified filter test * make optimizeLookup private	2016-05-19 16:29:16 -07:00
Charles Allen	15ccf451f9	Move QueryGranularity static fields to QueryGranularities (#2980 ) * Move QueryGranularity static fields to QueryGranularityUtil * Fixes #2979 * Add test showing #2979 * change name to QueryGranularities	2016-05-17 16:23:48 -07:00
Charles Allen	fb01db4db7	[QTL] Allows RegisteredLookupExtractionFn to find its lookups lazily (#2971 ) * Allows RegisteredLookupExtractionFn to find its lookups lazily * Use raw variables instead of AtomicReference * Make sure to use volatile * Remove extra local variable. * Move from BAOS to ByteBuffer	2016-05-17 11:29:39 -07:00
Himanshu	d3e9c47a5f	use correct ObjectMapper in Index[IO/Merger] in AggregationTestHelper and minor fix in theta sketch SketchMergeAggregatorFactory.getMergingFactory(..) (#2943 )	2016-05-13 10:06:31 +05:30
Himanshu	d821144738	at historicals GpBy query mergeResults does not need merging as results are already merged by GroupByQueryRunnerFactory.mergeRunners(..) (#2962 )	2016-05-12 17:41:24 -07:00
Gian Merlino	01bebf432a	GroupByQuery: Multi-value dimension tests. (#2959 )	2016-05-12 11:31:50 -07:00
Dave Li	79a54283d4	Optimize filter for timeseries, search, and select queries (#2931 ) * Optimize filter for timeseries, search, and select queries * exception at failed toolchest type check * took out query type check * java7 error fix and test improvement	2016-05-09 11:04:06 -07:00
David Lim	b489f63698	Supervisor for KafkaIndexTask (#2656 ) * supervisor for kafka indexing tasks * cr changes	2016-05-04 23:13:13 -07:00
Himanshu	8e2742b7e8	adding QueryGranularity to segment metadata and optionally expose same from segmentMetadata query (#2873 )	2016-05-03 11:31:10 -07:00
Gian Merlino	6dc7688a29	TimeAndDims equals/hashCode implementation. (#2870 ) Adapted from #2692, thanks @navis for original implementation.	2016-04-22 08:45:20 +08:00
Slim	984a518c9f	Merge pull request #2734 from b-slim/LookupIntrospection2 [QTL][Lookup] adding introspection endpoint	2016-04-21 12:15:57 -05:00
Gian Merlino	c74391e54c	JavaScript: Ability to disable. (#2853 ) Fixes #2852.	2016-04-21 09:43:15 -05:00
Gian Merlino	7d3e55717d	Reduce cost of various toFilter calls. (#2860 ) These happen once per segment and so it's better if they don't do as much work.	2016-04-21 04:28:46 +08:00
Gian Merlino	59460b17cc	Add Filters.matchPredicate helper, use it where appropriate. (#2851 ) This approach simplifies code and is generally faster, due to skipping unnecessary dictionary lookups (see #2850).	2016-04-19 15:54:32 -07:00
Jisoo Kim	7b65ca7889	refactor ClientQuerySegmentWalker (#2837 ) * refactor ClientQuerySegmentWalker * add header to FluentQueryRunnerBuilder * refactor QueryRunnerTestHelper	2016-04-18 14:00:47 -07:00
Jonathan Wei	a26134575b	Fix NPE in TopNLexicographicResultBuilder.addEntry() (#2835 )	2016-04-13 17:27:16 -07:00
michaelschiff	db35dd7508	fix issue #2744 . Check for null before combining metrics (#2774 )	2016-04-12 14:46:31 -07:00
Nishant	1bf1dd03a0	Merge pull request #2812 from mrijke/fix-missing-equals-hashcode-filters Add missing equals/hashcode to JS, Regex and SearchQuery DimFilters	2016-04-12 12:00:23 +05:30
Charles Allen	21e406613c	Merge pull request #2809 from metamx/fix2694 Fix test for snapshot taker to better check for lookup perist failure	2016-04-11 14:52:47 -07:00
Maarten Rijke	de68d6b7c4	Add missing equals/hashcode to JS, Regex and SearchQuery DimFilters This commits adds missing equals() and hashcode() methods to the JavascriptDimFilter, RegexDimFilter and the SearchQueryDimFilter.	2016-04-11 12:16:24 +02:00
Nishant	bbb326decf	Merge pull request #2799 from b-slim/fix_snapshot MapLookupFactory need to be Ser/Desr ready.	2016-04-07 13:22:34 +05:30
Slim Bouguerra	bf1eafc4e1	remove all the mock lookupFactory	2016-04-06 15:37:52 -05:00
Slim Bouguerra	59eb2490a0	MapLookupFactory need to be Ser/Desr.	2016-04-06 15:02:18 -05:00
Charles Allen	f915a59138	Merge pull request #2691 from metamx/lookupExtrFn Add ExtractionFn to LookupExtractor bridge	2016-04-06 09:13:08 -07:00
Fangjin Yang	289bb6f885	Merge pull request #2690 from jon-wei/filter_support Allow filters to use extraction functions	2016-04-05 15:40:15 -06:00
jon-wei	0e481d6f93	Allow filters to use extraction functions	2016-04-05 13:24:56 -07:00
Gian Merlino	e060a9f283	Additional ExtractionFn null-handling adjustments. Followup to comments on #2771.	2016-04-01 18:35:26 -07:00
Fangjin Yang	18b9ea62cf	Merge pull request #2771 from gianm/extractionfn-stuff Various ExtractionFn null handling fixes.	2016-04-01 16:35:46 -07:00
Gian Merlino	23d66e5ff9	Merge pull request #2765 from navis/invalid-encode-nullstring Null string is encoded as "null" in incremental index	2016-04-01 14:43:40 -07:00
Gian Merlino	b6e4d8b2c1	Various ExtractionFn null handling fixes. - JavaScriptExtractionFn shouldn't pass empty strings to its JS functions - Upper/LowerExtractionFn properly handles null Objects (DimExtractionFn's implementation works here) - MatchingDimExtractionFn properly returns nulls rather than empties - RegexDimExtractionFn properly attempts matching on nulls and empties - SearchQuerySpecDimExtractionFn properly returns nulls when passed empties	2016-04-01 14:34:47 -07:00
Fangjin Yang	eea7a47870	Merge pull request #2576 from navis/paging-from-next Add option for select query to get next page without modifying returned paging identifiers	2016-04-01 13:50:36 -07:00
Fangjin Yang	4eb5a2c4f1	Merge pull request #2715 from navis/stringformat-null-handling stringFormat extractionFn should be able to return null on null values (Fix for #2706)	2016-04-01 13:45:28 -07:00

1 2 3 4 5 ...

825 Commits