Commit Graph

289 Commits

Author SHA1 Message Date
Xavier Léauté e01ed16030 serde tests + equals/hashCode fixes for extraction functions 2015-03-11 16:48:28 -07:00
Xavier Léauté d3f5bddc5c Add ability to apply extraction functions to the time dimension
- Moves DimExtractionFn under a more generic ExtractionFn interface to
  support extracting dimension values other than strings
- pushes down extractionFn to the storage adapter from query engine
- 'dimExtractionFn' parameter has been deprecated in favor of 'extractionFn'
- adds a TimeFormatExtractionFn, allowing to project the '__time' dimension
- JavascriptDimExtractionFn renamed to JavascriptExtractionFn, adding
  support for any dimension value types that map directly to Javascript
- update documentation for time column extraction and related changes
2015-03-11 16:45:42 -07:00
Xavier Léauté 217e674063 Handling aggregators and post aggregators with duplicate names
* add test for same-name groupBy hyperUniques post-agg
* add test for same-name post-agg in groupby with approx histogram
* Fixes https://github.com/druid-io/druid/issues/1045
* Throws an error if post aggs and aggs do not have unique names
* Add more groupBy tests for Having filters
2015-03-10 17:10:43 -07:00
Xavier Léauté 0d47c0c36d normal division and configurable ordering for ArithmeticPostAggregator
Fixes #510
2015-03-04 12:44:24 -08:00
Himanshu Gupta 29039fd541 interval chunk query runner now processes individual chunk in a thread pool and prints metrics query/time per chunk 2015-03-02 15:45:09 -06:00
Xavier Léauté c5e99bf6ec Merge pull request #1105 from metamx/fixEmptyExtractionFilter
Fix empty results on ExtractionFilter.
2015-02-10 14:25:58 -08:00
Charles Allen b9cb311a52 Fix empty results on ExtractionFilter.
* Now returns empty results rather than erroring out
* Added unit tests for multiples case
2015-02-10 14:04:38 -08:00
fjy 708759e1e0 Update http-client to 1.0.0 2015-02-10 13:36:47 -08:00
Xavier Léauté a7dcaffb53 fix `__time` column selector for incremental index
- also adds tests for selecting the time column
2015-02-06 12:06:05 -08:00
Fangjin Yang 25cf15824b Merge pull request #1085 from gianm/dsmrv-fix
DataSourceMetadataResultValue fixes and JodaUtils adjustments.
2015-02-03 17:51:33 -08:00
Gian Merlino 085ad8d345 Fix DataSourceMetadataResultValue serde. 2015-02-03 17:39:42 -08:00
fjy 3e5d338c8e Remove non friendly dependencies from Druid 2015-02-03 11:36:08 -08:00
Eric Tschetter 42eba986ce Towards consistent null handling
This commit also includes
1) the addition of a context parameter on timeseries queries that allows it to ignore empty buckets instead of generating results for them
2) A cleanup of an unused method on an interface
2015-02-02 12:53:07 -08:00
Fangjin Yang 92e616de11 Merge pull request #1077 from metamx/remove-unused-imports
remove unused imports
2015-02-02 10:45:27 -08:00
nishantmonu51 ba932bb1f2 remove unused imports 2015-02-02 21:53:39 +05:30
fjy d05032b98a towards a community led druid 2015-01-31 20:57:36 -08:00
Xavier Léauté f24a89a22a fix NPE for topN over missing hyperUniques column 2015-01-27 16:12:41 -08:00
Fangjin Yang 91a79dbf95 Merge pull request #1031 from metamx/ingestmetadata-query
DataSourceMetadata query
2015-01-19 21:55:35 -08:00
Charles Allen 7bb038756c Account for very slow writer threads in IncrementalIndexTest 2015-01-17 13:02:59 -08:00
Fangjin Yang b4041c13e5 Merge pull request #1029 from metamx/fixChainedExecutionQueryRunnerTest
Address spurious test failures
2015-01-16 13:08:32 -08:00
Charles Allen 197af967ef Fix concurrency issues in OnheapIncrementalIndex
* Was encountering weird errors when fast writes were coming in while queries were happening.
* Added unit tests which tend to cause concurrency query problems
2015-01-16 12:01:46 -08:00
Charles Allen ebafa2a786 Fix spurious test failures in ChainedExecutionQueryRunnerTest 2015-01-15 16:49:16 -08:00
nishantmonu51 c7452b75f6 Merge branch 'master' into ingestmetadata-query 2015-01-15 18:00:31 +05:30
Xavier Léauté d5f4182de4 global test timeouts + fix test race condition 2015-01-07 23:36:57 -08:00
Xavier Léauté 3fc6cf918d add test for large chunks 2015-01-02 14:31:22 -08:00
Xavier Léauté f2439899e7 fix bitmap factory serde 2014-12-23 15:07:32 -08:00
Xavier Léauté 27a3169312 increase test timeouts 2014-12-19 17:09:43 -08:00
Charles Allen 971afab36f Lengthen CompressionStrategyTest::testKnownSizeConcurrency() to have 2m timeout on its test to account for shared Jenkins build lag 2014-12-19 12:53:20 -08:00
Fangjin Yang be507b8cb4 Merge pull request #943 from mrijke/partialdimextractfn-nullpointer
Fix NullPointerException in PartialDimExtractionFn
2014-12-16 12:29:27 -07:00
nishantmonu51 a0d3579a92 add docs + fix tests 2014-12-11 17:58:01 +05:30
nishantmonu51 32b4f55b8a review comments refactoring 2014-12-11 16:33:14 +05:30
nishantmonu51 3763357f6e Ingest metadata query implementation 2014-12-10 19:44:00 +05:30
nishantmonu51 1a1b0e6f23 merge from master and review comments 2014-12-09 13:16:45 +05:30
Maarten Rijke bd9bbf396c Fix NullPointerException in PartialDimExtractionFn by explicity checking for dimValue == null 2014-12-08 20:11:58 +01:00
Xavier Léauté 7cd45a6e1f IncrementalIndex throws exception if limit exceeded
- For now uses a hardcoded ratio of aggregator to timeanddim buffer sizes
- canAppendRow is a workaround for realtime index since the
Firehose currently does not have a way of rolling back the last event in
case of error
- canAppendRow needs a fudge factor; there is a race between checking
if we can add a row and actually adding a row, because of the way MapDB
reports its size.
2014-12-04 14:38:16 -08:00
nishantmonu51 269a51964e fix size calculation 2014-12-04 17:22:24 +05:30
nishantmonu51 4dc0fdba8a consider mapped size in limit calculation & review comments 2014-12-03 23:47:30 +05:30
nishantmonu51 da8bd7836b Introduce buffer size 2014-12-03 16:28:22 +05:30
fjy bc173d14fc a whole bunch of cleanup and fixes 2014-12-02 17:32:05 -08:00
nishantmonu51 b65933ffb8 make tests parameterised 2014-12-02 23:55:29 +05:30
nishantmonu51 eac776f1a7 tests passing with on heap incremental index 2014-12-02 22:29:28 +05:30
xvrl 5bc1be5ba0 Merge pull request #850 from metamx/druid-0.7.x-compressionstrategy
Compression strategy changes
2014-11-25 12:58:39 -08:00
Charles Allen c6043afa32 Removed empty function from CompressionStrategyTest 2014-11-25 12:57:06 -08:00
Charles Allen 70e3108282 Multiple speed improvements revolving around topN with HLL
Change serializer / deserializer for HyperLogLog
* Changed DirectDruidClient's InputStream handling. Is now ~10% faster for data heavy queries, and has lower variance in execution speed.
* Changed HLL Collector's toByteStream() method to be better optimized for small values. Is notably faster for small result quantities which fall into the sparse HLL bucket codepath.
  * No change for dense HLL which just uses a direct bytestream of the underlying byte data.

TopNNumericResultBuilder semi-aggressive loop unrolling for metricVals

Benchmark for HLL for sparse packing (small HLL bucket population):
HyperLogLogSerdeBenchmarkTest.benchmarkToByteBuffer[0]: [measured 100000 out of 100100 rounds, threads: 1 (sequential)]
 round: 0.00 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 216, GC.time: 0.42, time.total: 15.96, time.warmup: 0.22, time.bench: 15.74
HyperLogLogSerdeBenchmarkTest.benchmarkToByteBuffer[1]: [measured 100000 out of 100100 rounds, threads: 1 (sequential)]
 round: 0.00 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 217, GC.time: 0.45, time.total: 13.87, time.warmup: 0.02, time.bench: 13.85
HyperLogLogSerdeBenchmarkTest.benchmarkToByteBuffer[2]: [measured 100000 out of 100100 rounds, threads: 1 (sequential)]
 round: 0.00 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 55, GC.time: 0.16, time.total: 4.13, time.warmup: 0.00, time.bench: 4.12
HyperLogLogSerdeBenchmarkTest.benchmarkToByteBuffer[3]: [measured 100000 out of 100100 rounds, threads: 1 (sequential)]
 round: 0.00 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 55, GC.time: 0.16, time.total: 4.30, time.warmup: 0.00, time.bench: 4.30
HyperLogLogSerdeBenchmarkTest.benchmarkToByteBuffer[4]: [measured 100000 out of 100100 rounds, threads: 1 (sequential)]
 round: 0.00 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 8, GC.time: 0.03, time.total: 1.10, time.warmup: 0.00, time.bench: 1.09
HyperLogLogSerdeBenchmarkTest.benchmarkToByteBuffer[5]: [measured 100000 out of 100100 rounds, threads: 1 (sequential)]
 round: 0.00 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 8, GC.time: 0.03, time.total: 0.72, time.warmup: 0.00, time.bench: 0.72
HyperLogLogSerdeBenchmarkTest.benchmarkToByteBuffer[6]: [measured 100000 out of 100100 rounds, threads: 1 (sequential)]
 round: 0.00 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 1, GC.time: 0.00, time.total: 0.60, time.warmup: 0.00, time.bench: 0.60
HyperLogLogSerdeBenchmarkTest.benchmarkToByteBuffer[7]: [measured 100000 out of 100100 rounds, threads: 1 (sequential)]
 round: 0.00 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 2, GC.time: 0.01, time.total: 0.26, time.warmup: 0.00, time.bench: 0.25

Updates to HyperLogLogCollector toByteBuffer() based on code review

Removed changes from DirectDruidClient from this branch and put it in another branch.

Changed HyperLogLogCollector to have protected getters and setters

Remove unused ByteOrder from HyperLogLogCollector

Copyright header on HyperLogLogSerdeBenchmarkTest

Now with less ass!

Reformat in TopNNumericResultsBuilder. No code change

Removed unused import in HyperLogLogCollector

Replace AppendableByteArrayInputStream in DirectDruidClient
* Replace with SequenceInputStream fueled by an enumeration of ChannelBufferInputStream which directly wrap the response context ChannelBuffer

Modify TopNQueryQueryToolChest to use Arrays instead of Lists

Modify TopNQueryQueryToolChest to use Arrays instead of Lists

Revert accidental changes to DirectDruidClient

They should be in another merge request:
https://github.com/metamx/druid/pull/893

Fixes from code review
* Extracting names from AggregatorFactory classes now done with TopNQueryQueryToolChest.extractFactoryName
* Renamed variable in TopNNumericResultBuilder
2014-11-24 16:02:00 -08:00
xvrl 9ced097abd Merge pull request #895 from metamx/fix-interval-retry
A set of fixes to retry the query for missing segments in the timeline
2014-11-24 10:23:02 -08:00
Charles Allen aa49e56ed6 Merge remote-tracking branch 'origin/master' into druid-0.7.x-compressionstrategy 2014-11-21 10:29:40 -08:00
fjy ef62bccdec ignore benchmark 2014-11-20 16:52:19 -08:00
nishantmonu51 e3260aa177 Filtered Aggregator fixes + enhancements
- fix NPE on IncrementIndex
- refactor code to support AND, OR filter
- tests for AND & OR filter
- handling for missing column / null values
2014-11-20 15:17:18 -08:00
fjy 47f5c1bd0a fix retry interval is stupid 2014-11-20 12:50:56 -08:00
fjy 3d9d989a9f A set of fixes to retry the query for missing intervals in the timeline 2014-11-20 12:04:37 -08:00