druid

Commit Graph

Author	SHA1	Message	Date
Parag Jain	8a13a85765	Introduce SegmentizerFactory (#3901 ) * Introduce SegmentizerFactory - that knows how to deserialize specific type of segment - Default implementation is MMappedQueryableSegmentizerFactory which creates QueryableIndexSegment - Unit test for the default behavior * review comments	2017-02-06 10:05:12 -08:00
DaimonPl	93b71e265e	Extract HLL related code to separate module (#3900 )	2017-02-03 09:45:11 -08:00
Jonathan Wei	182261f713	Allow configurable temp directory for query processing (#3893 )	2017-02-02 10:22:28 -08:00
Jonathan Wei	e6b95e80aa	Remove deprecated Aggregator/AggregatorFactory methods (#3894 )	2017-02-01 14:43:18 -08:00
Gian Merlino	d3a3b7ba0c	Add virtual column types, holder serde, and safety features. (#3823 ) * Add virtual column types, holder serde, and safety features. Virtual columns: - add long, float, dimension selectors - put cache IDs in VirtualColumnCacheHelper - adjust serde so VirtualColumns can be the holder object for Jackson - add fail-fast validation for cycle detection and duplicates - add expression virtual column in core Storage adapters: - move virtual column hooks before checking base columns, to prevent surprises when a new base column is added that happens to have the same name as a virtual column. * Fix ExtractionDimensionSpecs with virtual dimensions. * Fix unused imports. * CR comments * Merge one more time, with feeling.	2017-01-26 18:15:51 -08:00
Roman Leventov	75d9e5e7a7	DimensionSelector-related bug fixes and optimizations (fixes #3799 , part of #3798 ) (#3858 ) * * Add DimensionSelector.idLookup() and nameLookupPossibleInAdvance() to allow better inspection of features DimensionSelectors supports, and safer code working with DimensionSelectors in BaseTopNAlgorithm, BaseFilteredDimensionSpec, DimensionSelectorUtils; * Add PredicateFilteringDimensionSelector, to make BaseFilteredDimensionSpec to be able to decorate DimensionSelectors with unknown cardinality; * Add DimensionSelector.makeValueMatcher() (two kinds) for DimensionSelector-side specifics-aware optimization of ValueMatchers; * Optimize getRow() in BaseFilteredDimensionSpec's DimensionSelector, StringDimensionIndexer's DimensionSelector and SingleScanTimeDimSelector; * Use two static singletons, TrueValueMatcher and FalseValueMatcher, instead of BooleanValueMatcher; * Add NullStringObjectColumnSelector singleton and use it in MapVirtualColumn * Rename DimensionSelectorUtils.makeNonDictionaryEncodedIndexedIntsBasedValueMatcher to makeNonDictionaryEncodedRowBasedValueMatcher * Make ArrayBasedIndexedInts constructor private, replace it's usages with of() static factory method * Cache baseIdLookup in ForwardingFilteredDimensionSelector * Fix a bug in DimensionSelectorUtils.makeRowBasedValueMatcher(selector, predicate, matchNull) * Employ precomputed BitSet optimization in DimensionSelector.makeValueMatcher(value, matchNull) when lookupId() is not available, but cardinality is known and lookupName() is available * Doc fixes * Addressed comments * Fix * Fix * Adjust javadoc of DimensionSelector.nameLookupPossibleInAdvance() for SingleScanTimeDimSelector * throw UnsupportedOperationException instead of IAE in BaseTopNAlgorithm	2017-01-25 15:28:27 -08:00
Gian Merlino	3136dfa421	LikeFilter: Read value lazily when doing a prefix-based match. (#3880 ) This speeds up cases where we don't actually need to read the value, such as "LIKE 'foo%'".	2017-01-25 13:22:07 -08:00
Roman Leventov	af93a8d189	Sequences refactorings and removed unused code (part of #3798 ) (#3693 ) * Removing unused code from io.druid.java.util.common.guava package; fix #3563 (more consistent and paranoiac resource handing in Sequences subsystem); Add Sequences.wrap() for DRY in MetricsEmittingQueryRunner, CPUTimeMetricQueryRunner and SpecificSegmentQueryRunner; Catch MissingSegmentsException in SpecificSegmentQueryRunner's yielder.next() method (follow up on #3617) * Make Sequences.withEffect() execute the effect if the wrapped sequence throws exception from close() * Fix strange code in MetricsEmittingQueryRunner * Add comment on why YieldingSequenceBase is used in Sequences.withEffect() * Use Closer in OrderedMergeSequence and MergeSequence to close multiple yielders	2017-01-19 20:07:43 -08:00
kaijianding	33ae9dd485	streaming version of select query (#3307 ) * streaming version of select query * use columns instead of dimensions and metrics;prepare for valueVector;remove granularity * respect query limit within historical * use constant * fix thread name corrupted bug when using jetty qtp thread rather than processing thread while working with SpecificSegmentQueryRunner * add some test for scan query * add scan query document * fix merge conflicts * add compactedList resultFormat, this format is better for json ser/der * respect query timeout * respect query limit on broker * use static consts and remove unused code	2017-01-19 16:09:53 -06:00
Slim	558dc365a4	renaming classes to be run by mvn and comment non operational tests (#3847 )	2017-01-17 11:59:12 -08:00
Akash Dwivedi	dd0c4e2ead	Migrating extendedset from Metamarkets. (#3694 ) * Migrating extendedset from Metamarkets. * Notice change * More details in NOTICE * NOTICE formatting. * suppress header checkstlye for extendedset.	2017-01-17 10:10:27 -08:00
Gian Merlino	e86859b228	SQL support for nested groupBys. (#3806 ) * SQL support for nested groupBys. Allows, for example, doing exact count distinct by writing: SELECT COUNT() FROM (SELECT DISTINCT col FROM druid.foo) Contrast with approximate count distinct, which is: SELECT COUNT(DISTINCT col) FROM druid.foo Add deeply-nested groupBy docs, tests, and maxQueryCount config. * Extract magic constants into statics. * Rework rules to put preconditions in the "matches" method.	2017-01-11 18:32:53 -08:00
Jihoon Son	d80bec83cc	Enable auto license checking (#3836 ) * Enable license checking * Clean duplicated license headers	2017-01-10 18:13:47 -08:00
Jihoon Son	c099977a5b	Add an option to SearchQuery to choose a search query execution strategy (#3792 ) * Add an option to SearchQuery to choose a search query execution strategy. Supported strategies are 1) Index-only query execution 2) Cursor-based scan 3) Auto: choose an efficient strategy for a given query * Add SearchStrategy and SearchQueryExecutor * Address comments * Rename strategies and set UseIndexesStrategy as the default strategy * Add a cost-based planner for auto strategy * Add document * Fix code style * apply code style * apply comments	2017-01-10 18:04:20 -08:00
Gian Merlino	3c63cff57a	Remove makeMathExpressionSelector from ColumnSelectorFactory. (#3815 ) * Remove makeMathExpressionSelector from ColumnSelectorFactory. * Add @Nullable annotations in places, fix Number.class check. * Break up createBindings, add tests. * Add null check.	2017-01-05 18:06:38 -08:00
Gian Merlino	220ca7ebb6	Ignore DimFilterHavingSpec testConcurrentUsage. (#3814 )	2017-01-03 17:43:58 -07:00
Gian Merlino	d8702ebece	Filters: Use ColumnSelectorFactory directly for building row-based matchers. (#3797 ) * Filters: Use ColumnSelectorFactory directly for building row-based matchers. * Adjustments based on code review. - BoundDimFilter: fewer volatiles, rename matchesAnything to !matchesNothing. - HavingSpecs: Clarify that they are not thread-safe, and make DimFilterHavingSpec not thread safe. - Renamed rowType to rowSignature. - Added specializations for time-based vs non-time-based DimensionSelector in RBCSF. - Added convenience method DimensionHanderUtils.createColumnSelectorPlus. - Added singleton ZeroIndexedInts. - Added test cases for DimFilterHavingSpec. * Make ValueMatcherColumnSelectorStrategy actually use the associated selector. * Add RangeIndexedInts. * DimFilterHavingSpec: Fix concurrent usage guard on jdk7. * Add assertion to ZeroIndexedInts. * Rename no-longer-volatile members.	2017-01-03 14:30:22 -08:00
Roman Leventov	33800122ad	Don't return leaked Objects back to StupidPool, because this is dangerous. Reuse Cleaners in StupidPool. Make StupidPools named. Add StupidPool.leakedObjectCount(). Minor fixes (#3631 )	2016-12-26 00:35:35 -06:00
Jonathan Wei	0e5bd8b4d4	Add dimension type-based interface for query processing (#3570 ) * Add dimension type-based interface for query processing * PR comment changes * Address PR comments * Use getters for QueryDimensionInfo * Split DimensionQueryHelper into base interface and query-specific interfaces * Treat empty rows as nulls in v2 groupby * Reduce boxing in SearchQueryRunner * Add GroupBy empty row handling to MultiValuedDimensionTest * Address PR comments * PR comments and refactoring * More PR comments * PR comments	2016-12-21 20:11:37 -07:00
Jonathan Wei	2bfcc8a592	First and Last Aggregator (#3566 ) * add first and last aggregator * add test and fix * moving around * separate aggregator valueType * address PR comment * add finalize inner query and adjust v1 inner indexing * better test and fixes * java-util import fixes * PR comments * Add first/last aggs to ITWikipediaQueryTest	2016-12-16 15:26:40 -08:00
Himanshu	ed322a4beb	remove size from default analysisTypes list for segmentMetadata query (#3773 )	2016-12-13 18:01:21 -08:00
Jonathan Wei	880a021a7a	Fix missed travis failures from PR 3567 and 2798 (#3761 ) * Fix checkstyle failures from PR 3567 * Fix GranularityPathSpecTest compile failure	2016-12-07 19:07:31 -08:00
Erik Dubbelboer	bb9e35e1af	Add Greatest and Least post aggregations (#3567 )	2016-12-07 17:58:23 -08:00
Roman Leventov	dc8f814acc	Optimize Iterator<ImmutableBitmap> implementation inside Filters.matchPredicate() so that it doesn't emit empty bitmap in the end of the iteration, and make it to follow Iterator contract, that is throw NoSuchElementException from next() if there are no more bitmaps (#3754 )	2016-12-07 12:54:09 -08:00
Jonathan Wei	d1896a2d62	Disable flush after every ObjectMapper write (#3748 )	2016-12-06 16:45:23 -08:00
Gian Merlino	b1bac9f2d3	groupBy v2: Ignore timestamp completely when granularity = all, except for the final merge. (#3740 ) * GroupByBenchmark: Add serde, spilling, all-gran benchmarks. Also use more iterations. * groupBy v2: Ignore timestamp completely when granularity = all, except for the final merge. Specifically: - Remove timestamp from RowBasedKey when not needed - Set timestamp to null in MapBasedRows that are not part of the final merge.	2016-12-06 16:17:32 -08:00
Himanshu	45da7e48f1	groupBy sort results by (dimensions,timestamp) instead of (timestamp,dimension) (#3672 ) * sortByDimsFirst flag for groupBy query * Remove need for KeyType in Grouper<KeyType> to be Comparable<KeyType> * fix review comments * fix review comments regarding removing code duplication of dim/time comparison * move comparator for KeyType object to KeySerdeFactory so that creation of comparator does not need KeySerde * remove unnecessary system.out.println * make access static var NATURAL_NULLS_FIRST directly * further review comments addressing	2016-12-06 09:48:56 -08:00
Navis Ryu	c74d267f50	Support virtual column for select query (#2511 ) * Support virtual column for select query * Addressed comments	2016-12-05 15:14:35 -08:00
Gian Merlino	b64e06704e	Fix SingleScanTimeDimSelector when an extractionFn returns null for a timestamp. (#3732 )	2016-12-02 15:27:54 -08:00
Gian Merlino	f4cc8c2b2f	IndexBuilder: Close IncrementalIndex when done. (#3734 )	2016-12-02 16:56:34 -06:00
Gian Merlino	353fee79dd	Add "asMillis" option to "timeFormat" extractionFn. (#3733 ) This is useful for chaining extractionFns that all want to treat time as millis, such as having a javascript extractionFn after a timeFormat.	2016-12-02 13:45:16 -08:00
Gian Merlino	102375d9bb	Add "strlen" extractionFn. (#3731 )	2016-12-02 12:08:51 -08:00
Gian Merlino	4c5d10f8a3	Add DimFilterHavingSpec. (#3727 ) * Add DimFilterHavingSpec. * Add test for DimFilterHavingSpec with extractionFns.	2016-12-02 10:04:30 -08:00
Gian Merlino	68735829ca	Add, fix equals, hashCode, toString on various classes. (#3723 ) * TimeFormatExtractionFn: Add toString. * InDimFilter: Add toString, allow accepting any Collection of values. * DimensionTopNMetricSpec: Fix toString. * InvertedTopNMetricSpec: Add toString. * HyperUniqueFinalizingPostAggregator: Add equals, hashCode, toString.	2016-11-30 19:00:14 -08:00
Gian Merlino	477e0cab7c	Filter fixes and tests (#3724 ) * More robust Filter tests. All Filter tests now exercise the CNF and post-filtering features. * Fixes to RowBasedValueMatcherFactory and to bound filters. - Change Comparables to Strings in ValueMatcher related code. - Break out RowBasedValueMatcherFactory, fix a variety of issues around nulls, and add tests. - Fix bound filters on long columns with non-numeric bounds, and add tests.	2016-11-30 16:10:05 -08:00
Gian Merlino	6922d684bf	GroupBy: Validation of output names, and a gross hack for v1 subqueries. (#3686 ) v1 subqueries try to use aggregators to "transfer" values from the inner results to an incremental index, but aggregators can't transfer all kinds of values (strings are a common one). This is a workaround that selectively ignores what the outer aggregators ask for and instead assumes that we know best. These are in the same commit because the name validation changed the kinds of errors that were thrown by v1 subqueries.	2016-11-29 12:35:03 +05:30
Roman Leventov	c070b4a816	Fix concurrency defects, remove unnecessary volatiles (#3701 )	2016-11-22 16:42:28 -08:00
Roman Leventov	7b56cec3b9	Fix resource leaks (#3702 )	2016-11-18 21:21:36 +05:30
Gian Merlino	7e80d1045a	Exercise v2 engine in the groupBy aggregator and multi-value dimension tests. (#3698 ) This also involved some other test changes: - Added a factory.mergeRunners step to AggregationTestHelper's groupBy chain, since the v2 engine does merging there. - Changed test byteBuffer pools from on-heap to off-heap to work around https://github.com/DataSketches/sketches-core/pull/116 for datasketches tests.	2016-11-16 20:02:25 -08:00
Keuntae Park	094f5b851b	Support Min/Max for Timestamp (#3299 ) * Min/Max aggregator for Timestamp * remove unused imports and method * rebase and zip the test data * add docs	2016-11-14 23:00:21 -08:00
Gian Merlino	9ad34a3f03	groupBy v1: Force all dimensions to strings. (#3685 ) Fixes #3683.	2016-11-14 09:30:18 -08:00
Jisoo Kim	7c0f462fbc	fix bug in StringDimensionHandler and add a cli tool for validating segments (#3666 )	2016-11-11 18:46:25 -08:00
Roman Leventov	fbbb55f867	Update emitter dependency to 0.4.0 and emit "version" dimension for all druid metrics (#3679 ) * Update emitter dependency to 0.4.0 and emit "version" dimension for all druid metrics, not only query metrics * Remove unused imports * Use empty string instead of "testing-version" as a version placeholder	2016-11-11 17:17:27 -06:00
Akash Dwivedi	3e408497b3	Migrating bytebuffercollections from Metamarkets. (#3647 ) * Migrating bytebuffercollections from Metamarkets. * resolving code conflicts and removing <p> from bytebuffer-collections.	2016-11-11 10:51:07 -08:00
Gian Merlino	fd5451486c	Short-circuiting AndFilter. (#3676 ) If any of the bitmaps are empty, the result will be false.	2016-11-11 10:14:56 -08:00
Gian Merlino	657e4512d2	Checkstyle checks for AvoidStaticImport, UnusedImports. (#3660 ) Excludes tests from AvoidStaticImport, since those are used often there and I didn't want to make this changeset too large. Production code use was minimal and I switched those to non-static imports.	2016-11-05 11:34:36 -07:00
Gian Merlino	4cbebd0931	SubstringDimExtractionFn, BoundDimFilter: Implement typical style toString. (#3658 )	2016-11-04 13:31:47 -07:00
Gian Merlino	600bbd4a17	BucketExtractionFn: Implement hashCode, fix toString. (#3656 )	2016-11-04 11:24:02 -07:00
Gian Merlino	8b3c86f41f	Fix FilteredAggregatorFactory toString formatting. (#3657 )	2016-11-04 11:23:55 -07:00
Gian Merlino	2c504b6258	Add "like" filter. (#3642 ) * Add "like" filter. * Addressed some PR comments. * Slight simplifications to LikeFilter. * Additional simplifications. * Fix comment in LikeFilter. * Clarify comment in LikeFilter. * Simplify LikeMatcher a bit. * No use going through the optimized path if prefix is empty. * Add more tests.	2016-11-04 23:25:03 +05:30
Navis Ryu	b99e14e732	Support configuration for handling multi-valued dimension (#2541 ) * Support configuration for handling multi-valued dimension * Addressed comments * use MultiValueHandling.ofDefault() for missing policy	2016-11-03 22:38:54 -06:00
Navis Ryu	e10def32f2	Support string type in math expression (#2836 ) * Support string type in math expression addressed comments addressed comments Addressed comments * Updated math function document * Addressed comments	2016-11-02 21:10:48 -06:00
kaijianding	2961406b90	fix zero period in PeriodGranularity causing gran.iterable(start, end) infinite loop (#3644 )	2016-11-02 15:40:07 +05:30
Roman Leventov	4b0d6cf789	Fix resource leaks (ComplexColumn and GenericColumn) (#3629 ) * Remove unused ComplexColumnImpl class * Remove throws IOException from close() in GenericColumn, ComplexColumn, IndexedFloats and IndexedLongs * Use concise try-with-resources syntax in several places * Fix resource leaks (ComplexColumn and GenericColumn) in SegmentAnalyzer, SearchQueryRunner, QueryableIndexIndexableAdapter and QueryableIndexStorageAdapter * Use Closer in Iterable, returned from QueryableIndexIndexableAdapter.getRows(), in order to try to close everything even if closing some parts thew exceptions	2016-11-02 09:23:52 +05:30
Gian Merlino	45940d6e40	Math expressions support for missing columns. (#3630 ) Also add SchemaEvolutionTest to help test this kind of thing. Fixes #3627 and includes test for #3625.	2016-11-01 09:40:25 -07:00
Gian Merlino	89d9c61894	Deprecate Aggregator.getName and AggregatorFactory.getAggregatorStartValue. (#3572 )	2016-10-31 15:24:30 -07:00
Navis Ryu	3fca3be9ea	SpecificSegmentQueryRunner misses missing segments from toYielder() (#3617 )	2016-10-30 11:47:29 -07:00
Himanshu	23a8e22836	fix SketchMergeAggregatorFactory.finalizeResults, comparator and more UTs for timeseries, topN (#3613 )	2016-10-28 15:48:33 -07:00
Navis Ryu	898c1c21af	More best-effort parse long (#3603 ) * More best-effort parse long * addressed comments	2016-10-25 10:31:51 -07:00
Akash Dwivedi	4b3bd8bd63	Migrating java-util from Metamarkets. (#3585 ) * Migrating java-util from Metamarkets. * checkstyle and updated license on java-util files. * Removed unused imports from whole project. * cherry pick metamx/java-util@826021f. * Copyright changes on java-util pom, address review comments.	2016-10-21 14:57:07 -07:00
Navis Ryu	8b7ff4409a	Math expressional parameters for aggregator (#2783 ) * Supports expression-paramed aggregator (squashed and rebased on master) also includes math post aggregator (was #2820) * Addressed comments * addressed comments	2016-10-19 13:58:35 -05:00
Roman Leventov	b113a34355	In CPUTimeMetricQueryRunner, account CPU consumed in baseSequence.toYielder() (#3587 )	2016-10-18 09:06:42 -05:00
Charles Allen	2c5c8198db	Make query/cpu/time still report on error (#3535 )	2016-10-18 08:26:21 -05:00
Roman Leventov	9611358f0a	Small topn scan improvements (#3526 ) * Remove unused numProcessed param from PooledTopNAlgorithm.aggregateDimValue() * Replace AtomicInteger with simple int in PooledTopNAlgorithm.scanAndAggregate() and aggregateDimValue() * Remove unused import	2016-10-17 10:36:19 -07:00
Gian Merlino	285516bede	Workaround non-thread-safe use of HLL aggregators. (#3578 ) Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and "get" methods can be called simultaneously by OnheapIncrementalIndex, since its "doAggregate" and "getMetricObjectValue" methods are not synchronized. This means that the optimization of HyperLogLogCollector.fold in #3314 (saving and restoring position rather than duplicating the storage buffer of the right-hand side) could cause corruption in the face of concurrent writes. This patch works around the issue by duplicating the storage buffer in "get" before returning a collector. The returned collector still shares data with the original one, but the situation is no worse than before #3314. In the future we may want to consider making a thread safe version of HLLC that avoids these kinds of problems in realtime indexing. But for now I thought it was best to do a small change that restored the old behavior.	2016-10-17 09:39:12 -07:00
Roman Leventov	5dc95389f7	Add Checkstyle framework (#3551 ) * Add Checkstyle framework * Avoid star import * Need braces for control flow statements * Redundant imports * Add NewLineAtEndOfFile check	2016-10-13 13:37:47 -07:00
Roman Leventov	85ac8eff90	Improve performance of IndexMergerV9 (#3440 ) * Improve performance of StringDimensionMergerV9 and StringDimensionMergerLegacy by avoiding primitive int boxing by using IntIterator in IndexedInts instead of Iterator<Integer>; Extract some common logic for V9 and Legacy mergers; Minor improvements to resource handling in StringDimensionMergerV9 * Don't mask index in MergeIntIterator.makeQueueElement() * DRY conversion RoaringBitmap's IntIterator to fastutil's IntIterator * Do implement skip(n) in IntIterators extending AbstractIntIterator because original implementation is not reliable * Use Test(expected=Exception.class) instead of try { } catch (Exception e) { /* ignore */ }	2016-10-13 08:28:46 -07:00
Charles Allen	76e77cb610	Make segment creation gauva 14 friendly (#3520 )	2016-10-05 15:25:03 -07:00
Gian Merlino	40f2fe7893	Bump versions to 0.9.3-SNAPSHOT (#3524 )	2016-09-29 13:53:32 -07:00
Charles Allen	654e1db309	Add simple test to FunctionalExtractionTest (#3522 )	2016-09-28 23:45:15 -07:00
Gian Merlino	d5a8a35fec	groupBy: GroupByRowProcessor fixes, invert subquery context overrides. (#3502 ) - Fix GroupByRowProcessor config overrides - Fix GroupByRowProcessor resource limit checking - Invert subquery context overrides such that for the subquery, its own keys override keys from the outer query, not the other way around. The last bit is necessary for the test to work, and seems like a better way to do it anyway.	2016-09-23 14:41:09 -07:00
Gian Merlino	7195be32d8	groupBy v2: Fix dangling references. (#3500 ) Acquiring references in the processing task prevents dangling references caused by canceled processing tasks.	2016-09-24 01:59:11 +05:30
Gian Merlino	f8d71fc602	groupBy: Fix maxMergingDictionarySize config. (#3488 )	2016-09-22 10:02:33 -07:00
Gian Merlino	c87ecea975	Fix ListFilteredDimensionSpec blacklisting on non-present values. (#3487 )	2016-09-22 09:12:02 -07:00
Navis Ryu	49c0fe0e8b	Show candidate hosts for the given query (#2282 ) * Show candidate hosts for the given query * Added test cases & minor changes to address comments * Changed path-param to query-pram for intervals/numCandidates	2016-09-22 08:32:38 -07:00
Keuntae Park	54ec4dd584	Support renaming of outputName for cached select and search query results (#3395 ) * support renaming of outputName for cached select and search queries * rebase and resolve conflicts * rollback CacheStrategy interface change * updated based on review comments	2016-09-20 08:19:14 -07:00
Charles Allen	95e08b38ea	[QTL] Reduced Locking Lookups (#3071 ) * Lockless lookups * Fix compile problem * Make stack trace throw instead * Remove non-germane change * * Add better naming to cache keys. Makes logging nicer * Fix #3459 * Move start/stop lock to non-interruptable for readability purposes	2016-09-16 11:54:23 -07:00
Jonathan Wei	df766b2bbd	Add dimension handling interface for ingestion and segment creation (#3217 ) * Add dimension handling interface for ingestion and segment creation * update javadocs for DimensionHandler/DimensionIndexer * Move IndexIO row validation into DimensionHandler * Fix null column skipping in mergerV9 * Add deprecation note for 'numeric_dims' filename pattern in IndexIO v8->v9 conversion * Fix java7 test failure	2016-09-12 12:54:02 -07:00
Gian Merlino	d108461838	groupBy v2: Parallel disk spilling. (#3433 ) In ConcurrentGrouper, when it becomes clear that disk spilling is necessary, switch from hash-based partitioning to thread-based partitioning. This stops processing threads from blocking each other while spilling is occurring.	2016-09-09 16:49:58 -06:00
Gian Merlino	1e3f94237e	groupBy v2: Configurable load factor. (#3437 ) Also change defaults: - bufferGrouperMaxLoadFactor from 0.75 to 0.7. - maxMergingDictionarySize to 100MB from 25MB, should be more appropriate for most heaps.	2016-09-07 14:14:59 -05:00
Roman Leventov	4f0bcdce36	Eager file unmapping in IndexIO, IndexMerger and IndexMergerV9 (#3422 ) * Eager file unmapping in IndexIO, IndexMerger and IndexMergerV9. The exact purpose for this change is to allow running IndexMergeBenchmark in Windows, however should also be universally 'better' than non-deterministic unmapping, done when MappedByteBuffers are garbage-collected (BACKEND-312) * Use Closer with a proper pattern in IndexIO, IndexMerger and IndexMergerV9 * Unmap file in IndexMergerV9.makeInvertedIndexes() using try-with-resources * Reformat IndexIO	2016-09-07 10:43:47 -07:00
Gian Merlino	8d2ae144a8	groupBy: Short-circuit identity preCompute manipulators. (#3434 )	2016-09-06 22:28:44 -06:00
Gian Merlino	1d07964987	LimitedTemporaryStorage: Fix perf bug. (#3432 ) FilterOutputStream has an inefficient implementation of write(byte[], int, int). So let's extend OutputStream directly and use efficient implementations of all methods.	2016-09-06 15:39:36 -07:00
Gian Merlino	8ed1894488	groupBy: Omit timestamp from merge key when granularity = all. (#3416 ) Fixes #3412.	2016-09-01 09:02:54 -07:00
Gian Merlino	6d25c5e053	Avoid materializing all groupBy results with order + limit. (#3410 ) The old TopNFunction code did Sequences.toList on the input sequence before using a priority queue to find the top N items. Now, the priority queue is used in an accumulator, so there is no need to fully materialize the results. Also removed equals/hashCode from the limitFn and remove limitFn from the GroupByQuery's hashCode, since that wasn't necessary and the implementation of hashCode wasn't correct anyway.	2016-08-31 14:08:07 -07:00
Gian Merlino	1268e2902c	Add groupBy test for multiple multi-value dimensions. (#3415 )	2016-08-31 11:21:10 -07:00
Gian Merlino	e9050c2b4c	TimeFormatExtractionFn: Allow null formats (equivalent to ISO8601) and granular bucketing. (#3411 )	2016-08-31 20:58:53 +05:30
Keuntae Park	0076b5fc1a	Interval bug fix for search query (#2903 ) * support query granularity and interval for search query * skip unncessary bitmap calculation when query interval contains whole the data interval of the given segments. * use binary search to find start and end index for the given interval * fix based on comment * bug fix based on the review comments and add unit tests	2016-08-31 20:52:44 +05:30
Dave Li	c4e8440c22	Adds long compression methods (#3148 ) * add read * update deprecated guava calls * add write and vsizeserde * add benchmark * separate encoding and compression * add header and reformat * update doc * address PR comment * fix buffer order * generate benchmark files * separate encoding strategy and format * fix benchmark * modify supplier write to channel * add float NONE handling * address PR comment * address PR comment 2	2016-08-30 16:17:46 -07:00
Jonathan Wei	4e91330a17	Use DimensionSpec in CardinalityAggregatorFactory (#3406 ) * Use DimensionSpec in CardinalityAggregatorFactory * Address PR comments * Fix requiredFields()	2016-08-30 15:54:02 -07:00
Gian Merlino	b11e9544ea	GroupBy v2: Improve hash code distribution. (#3407 ) Without this transformation, distribution of hash % X is poor in general. It is catastrophically poor when X is a multiple of 31 (many slots would be empty).	2016-08-30 12:09:08 +05:30
kaijianding	f037dfcaa4	fix missing segments duplicate retried (#3398 )	2016-08-29 23:46:21 +05:30
jaehong choi	2e0f253c32	introducing lists of existing columns in the fields of select queries' output (#2491 ) * introducing lists of existing columns in the fields of select queries' output * rebase master * address the comment. add test code for select query caching * change the cache code in SelectQueryQueryToolChest to 0x16	2016-08-25 21:37:53 +05:30
rajk-tetration	362b9266f8	Adding filters for TimeBoundary on backend (#3168 ) * Adding filters for TimeBoundary on backend Signed-off-by: Balachandar Kesavan <raj.ksvn@gmail.com> * updating TimeBoundaryQuery constructor in QueryHostFinderTest * add filter helpers * update filterSegments + test * Conditional filterSegment depending on whether a filter exists * Style changes * Trigger rebuild * Adding documentation for timeboundaryquery filtering * added filter serialization to timeboundaryquery cache * code style changes	2016-08-15 10:25:24 -07:00
Gian Merlino	e1b0b7de3e	IndexBuilder: Allow replacing rows, customizable maxRows. (#3359 )	2016-08-12 15:22:45 -07:00
Jonathan Wei	454587857c	Make StringComparator deserialization case-insensitive (#3356 )	2016-08-11 18:00:11 -07:00
Himanshu	043562914d	Update IncrementalIndex.getMetricType() to return type name stored by ComplexMetricsSerde instead of AggregatorFactory.getTypeName() (#3341 )	2016-08-10 11:03:44 -07:00
Gian Merlino	1eb7a7e882	Restore optimizations in BoundFilter. (#3343 )	2016-08-10 08:53:17 -07:00
Gian Merlino	a2bcd97512	IncrementalIndex: Fix multi-value dimensions returned from iterators. (#3344 ) They had arrays as values, which MapBasedRow doesn't understand and toStrings rather than converting to lists.	2016-08-10 08:47:29 -07:00
Jonathan Wei	890e3bdd3f	More informative query unit test names (#3342 )	2016-08-09 22:24:48 -07:00

1 2 3 4 5 ...

1697 Commits