Commit Graph

2576 Commits

Author SHA1 Message Date
Gian Merlino 3d6f409fc8 Fix groupBy on double dimensions. (#4596)
* Fix groupBy on double dimensions.

* Fix tests.

* Fix tests.

* Fix Scan tests.
2017-07-24 23:18:06 -07:00
Gian Merlino 8a4185897e Add filter tests for both floats and doubles. (#4597) 2017-07-24 17:02:02 -07:00
Atul Mohan 4bd0f174ba Changes for deduplication (#4581) 2017-07-24 11:12:23 -05:00
Roman Leventov 7408a7c4ed Refactor CachingClusteredClient.run() (#4489)
* Refactor CachingClusteredClient

* Comments

* Refactoring

* Readability fixes
2017-07-23 23:10:36 +09:00
Roman Leventov c0beb78ffd Enforce brace formatting with Checkstyle (#4564) 2017-07-21 10:26:59 -05:00
Slim 71e7a4c054 Adding double colums supports (#4491)
* add double columns support

* Fix numbers and expected results in UTs

* adding float aggregators

* fix IT expected test results

* fix comments

* more fixes

* fix comp

* fix test

* refactor double and float aggregator factories

* fix

* fix UTs

* fix comments

* clean unused code

* fix more comments

* undo unnecessary changes

* fix null issue

* refactor TopNColumnSelectorStrategyFactory

* fix docs

* refactor NumericTopNColumnSelectorStrategy

* fix return

* fix comments

* handle the null case in DimesionIndexer

* more null fixing

* cosmetic changes
2017-07-20 10:14:14 +03:00
Roman Leventov ae86323dbd Remove unnecessary qualifier (#4565) 2017-07-18 17:40:54 +09:00
Roman Leventov 60cdf94677 Add PMD and prohibit unnecessary fully qualified class names in code (#4350)
* Add PMD and prohibit unnecessary fully qualified class names in code

* Extra fixes

* Remove extra unnecessary fully-qualified names

* Remove qualifiers

* Remove qualifier
2017-07-17 22:22:29 +09:00
Chris Gavin 960cb07ea6 Fix some unnecessary use of boxed types and incorrect format strings spotted by lgtm. (#4474)
* Remove some unnecessary use of boxed types.

* Fix some incorrect format strings.

* Enable IDEA's MalformedFormatString inspection.

* Add a Checkstyle check for finding uses of incorrect logging packages.

* Fix some incorrect usages of the metamx logger.

* Bypass incorrect logger Checkstyle check where using the correct logger is not simple.

* Fix some more places where the wrong number of arguments are provided to format strings.

* Suppress `MalformedFormatString` inspection on legacy logging test.

* Use @SuppressWarnings rather than a noinspection suppression comment.

* Fix some more incorrect format strings.

* Suppress some more incorrect format string warnings where the incorrect string is intentional.

* Log the aggregator when closing it fails.

* Remove some unneeded log lines.
2017-07-13 12:15:32 -07:00
Roman Leventov b2865b7c7b Make possible to start Peon without DI loading of any querying-related stuff (#4516)
* Make QueryRunnerFactoryConglomerate injection lazy in TaskToolbox/TaskToolboxFactory

* Extract QueryablePeonModule and add druid.modules.excludeList config

* Typo
2017-07-12 13:18:25 -05:00
Goh Wei Xiang 53e6b5cb9b Removal of TopNResultMerger because it is vestigial. (#4520) 2017-07-12 13:24:07 +03:00
Roman Leventov ad76f7a1ab Make Filter.getBitmapResult() abstract (#4481) 2017-07-11 12:39:32 -07:00
Akash Dwivedi a108d05f76 Use GenericIndexed v2 supported read() during deserializeColumn (#4463) 2017-07-11 10:18:25 -05:00
Roman Leventov d168a4271e Use Double.NEGATIVE_INFINITY and Double.POSITIVE_INFINITY (#4496)
* Use Double.NEGATIVE_INFINITY and Double.POSITIVE_INFINITY instead of Double.MIN_VALUE and Double.MAX_VALUE, same for Float

* Replace usages in comments

* Fix RTree

* Remove commented code

* Add tests
2017-07-07 09:10:13 -06:00
Roman Leventov fc4fe24dd5 Incorrect use of Long.TYPE and Float.TYPE as return type of ObjectColumnSelector.classOfObject() (#4501) 2017-07-05 08:54:06 -07:00
Jonathan Wei 97a79f4478 Fix GroupBy type cast when ChainedExecutionQueryRunner merges results (#4488)
* Fix GroupBy type cast error when ChainedExecutionQueryRunner merges multiple runners

* Move conversion step to separate method

* Remove unnecessary comment

* Use compute to update map
2017-06-30 17:33:03 -07:00
Roman Leventov 9ae457f7ad Avoid using the default system Locale and printing to System.out in production code (#4409)
* Avoid usages of Default system Locale and printing to System.out or System.err in production code

* Fix Charset in DruidKerberosUtil

* Remove redundant string format in GenericIndexed

* Rename StringUtils.safeFormat() to unimportantSafeFormat(); add StringUtils.format() which fails as well as String.format()

* Fix testSafeFormat()

* More fixes of redundant StringUtils.format() inside ISE

* Rename unimportantSafeFormat() to nonStrictFormat()
2017-06-29 14:06:19 -07:00
Roman Leventov ae900a4934 Update versions to 0.11.0-SNAPSHOT (#4483) 2017-06-28 17:05:58 -07:00
Roman Leventov 6173570425 Add ExtensionsConfig.excludeModules (#4438)
* Add ExtensionsConfig.excludeModules

* Add branch

* Refactor Initialization.getFromExtensions()

* excludeModules -> moduleExcludeList

* Initialization.getFromExtensions() and getLoadedModules() should return Collection, not Set

* Fix doc
2017-06-28 14:01:31 -07:00
Gian Merlino 4c33d0a00f Add some new expression functions and macros. (#4442)
* Add some new expression functions and macros.

See misc/math-expr.md for the list of added functions, except for
"like", which previously existed but was not documented.

* Add easymock to datasketches tests.

* Add easymock to distinctcount tests.

* Add easymock to virtual-columns tests.

* Code review comments.

* Clean up code a bit.

* Add easymock to scan-query tests.

* Rework ExprMacros that have multiple impls.

* Improve test coverage.
2017-06-28 10:15:58 -07:00
Roman Leventov 2fa4b10145 More fine-grained DI for management node types. Don't allocate processing resources on Router (#4429)
* Remove DruidProcessingModule, QueryableModule and QueryRunnerFactoryModule from DI for coordinator, overlord, middle-manager. Add RouterDruidProcessing not to allocate processing resources on router

* Fix examples

* Fixes

* Revert Peon configs and add comments

* Remove qualifier
2017-06-27 22:58:01 -07:00
Roman Leventov 05d58689ad Remove the ability to create segments in v8 format (#4420)
* Remove ability to create segments in v8 format

* Fix IndexGeneratorJobTest

* Fix parameterized test name in IndexMergerTest

* Remove extra legacy merging stuff

* Remove legacy serializer builders

* Remove ConciseBitmapIndexMergerTest and RoaringBitmapIndexMergerTest
2017-06-26 13:21:39 -07:00
Jihoon Son 3e60c9125d Increase timeout of GroupByQueryMergeBufferTest and AppenderatorDriverTest (#4441) 2017-06-22 06:09:52 -07:00
Jihoon Son 3a5c480405 Split IndexMergerTest and ImmutableConciseSetTest (#4427) 2017-06-21 20:55:51 -07:00
Gian Merlino 34d2f9ebfe Queries: Restore old prepareAggregations method. (#4432)
For backwards compatibility, post-#4394.
2017-06-21 05:36:32 -07:00
Gian Merlino 679cf277c0 Add ExpressionFilter. (#4405)
* Add ExpressionFilter.

The expression filter expects a single argument, "expression", and matches
rows where that expression is true.

* Code review comments.

* CR comment.

* Fix logic.

* Fix test.

* Remove unused import.
2017-06-20 12:42:26 -07:00
Gian Merlino 22aad08a59 ExpressionPostAggregator: Automatically finalize inputs. (#4406)
* ExpressionPostAggregator: Automatically finalize inputs.

Raw HyperLogLogCollectors and such aren't very useful. When writing
expressions like `x / y` users will expect `x` and `y` to be finalized.

* Fix un-merge.

* Code review comments.

* Remove unnecessary ImmutableMap.copyOf.
2017-06-17 13:22:47 -07:00
Goh Wei Xiang f68a0693f3 Allow use of non-threadsafe ObjectCachingColumnSelectorFactory (#4397)
* Adding a flag to indicate when ObjectCachingColumnSelectorFactory need not be threadsafe.

* - Use of computeIfAbsent over putIfAbsent
- Replace Maps.newXXXMap() with normal instantiation
- Documentations on when is thread-safe required.
- Use Builders for On/OffheapIncrementalIndex

* - Optimization on computeIfAbsent
- Constant EMPTY DimensionsSpec
- Improvement on IncrementalIndexSchema.Builder
  - Remove setting of default values
  - Use var args for metrics
- Correction on On/OffheapIncrementalIndex Builders
- Combine On/OffheapIncrementalIndex Builders

* - Removing unused imports.

* - Helper method for testing with IncrementalIndex.Builder

* - Correction on javadoc.

* Style fix
2017-06-16 16:04:19 -05:00
Gian Merlino 054cf8a183 Limit random access in compressed column tests. (#4414)
* Limit random access in compressed column tests.

Random access leads to lots of block decompressions for reading single elements,
which is time prohibitive for the large column tests. For those tests, limit the
number of randomly accessed elements to 1000.

* Random -> ThreadLocalRandom
2017-06-15 14:48:06 -07:00
Jonathan Wei 7fe295009e Faster ByteBufferMinMaxOffsetHeapTest (#4404) 2017-06-15 13:14:29 -05:00
Gian Merlino 6edee7f434 Expressions work better with strings. (#4394)
* Expressions work better with strings.

- ExpressionObjectSelector able to read from string columns, and able to
  return strings.
- ExpressionVirtualColumn able to offer string (and long for that matter)
  as its native type.
- ExpressionPostAggregator able to return strings.
- groupBy, topN: Allow post-aggregators to accept dimensions as inputs,
  making ExpressionPostAggregator more useful.
- topN: Use DimExtractionTopNAlgorithm for STRING columns that do not
  have dictionaries, allowing it to work with STRING-type expression
  virtual columns.
- Adjusts null handling to better match the rest of Druid: null and
  empty string treated the same; nulls implicitly treated as zeroes in
  numeric context.

* Code review comments.

* More code review.

* Fix test.

* Adjust annotations.
2017-06-14 14:50:18 -07:00
Roman Leventov 113b8007b7 Increase timeout for QueryGranularityTest.testDeadLock() (#4395) 2017-06-12 13:28:21 -07:00
Gian Merlino 1f2afccdf8 Expressions: Add ExprMacros. (#4365)
* Expressions: Add ExprMacros, which have the same syntax as functions, but
can convert themselves to any kind of Expr at parse-time.

ExprMacroTable is an extension point for adding new ExprMacros. Anything
that might need to parse expressions needs an ExprMacroTable, which can
be injected through Guice.

* Address code review comments.
2017-06-08 09:32:10 -04:00
Jonathan Wei 9ae04b7375 Remove queryMetricsFactory from GroupByQueryConfig (#4383) 2017-06-07 21:35:26 -05:00
Roman Leventov 63a897c278 Enable most IntelliJ 'Probable bugs' inspections (#4353)
* Enable most IntelliJ 'Probable bugs' inspections

* Fix in RemoteTestNG

* Fix IndexSpec's equals() and hashCode() to include longEncoding

* Fix inspection errors

* Extract global isntance of natural().nullsFirst(); address comments

* Fix

* Use noinspection comments instead of SuppressWarnings on method for IntelliJ-specific inspections

* Prohibit Ordering.natural().nullsFirst() using Checkstyle
2017-06-07 09:54:25 -07:00
Roman Leventov b487fa355b More methods in QueryMetrics and TopNQueryMetrics (the last part of #3798) (#4284)
* Add more methods to QueryMetrics and TopNQueryMetrics, add BitmapResultFactory

* Add implementor expectations section to BitmapResultFactory javadoc
2017-06-07 09:49:08 -07:00
kaijianding 551a89bd67 serialize DateTime As Long to improve json serde performance (#4038) 2017-06-06 10:08:51 -07:00
Gian Merlino d22db30db4 VirtualColumns: Block virtual columns with empty names. (#4367)
* VirtualColumns: Block virtual columns with empty names.

* Spelling.
2017-06-06 08:05:47 -07:00
Roman Leventov 31d33b333e Make using implicit system Charset an error (#4326)
* Make using implicit system charset an error

* Use StringUtils.toUtf8() and fromUtf8() instead of String.getBytes() and new String()

* Use English locale in StringUtils.safeFormat()

* Restore comment
2017-06-05 23:57:25 -07:00
Jonathan Wei b90c28e861 Support limit push down for GroupBy (#3873)
* Support limit push down for GroupBy V2

* Use orderBy spec ordering when applying limit push down

* PR Comments

* Remove unused var

* Checkstyle fixes

* Fix test

* Add comment on non-final variables, fix checkstyle

* Address PR comments

* PR comments

* Remove unnecessary buffer reset

* Fix missing @JsonProperty annotation
2017-06-02 15:39:04 -07:00
praveev 290ed3ab9d Make DateTime timezone aware (#4343)
* Make DateTime timezone aware

* Change unit tests to make DateTime timezone aware for PeriodGranularity
2017-06-02 12:45:52 -07:00
kaijianding 0efd18247b explicitly unmap hydrant files when abandonSegment to recycle mmap memory (#4341)
* fix TestKafkaExtractionCluster fail due to port already used

* explicitly unmap hydrant files when abandonSegment to recyle mmap memory

* address the comments

* apply to AppenderatorImpl
2017-06-01 18:15:30 -05:00
Roman Leventov 50e72c6aea Fix bugs (core) (#4339)
* Fix bugs

* Add test for GoogleDataSegmentPusher.buildPath()

* Exclude extension changes

* Address comments

* Brace
2017-06-02 06:47:59 +09:00
Roman Leventov 78179ef74d Inject QueryMetrics factories via PolyBind (#4336) 2017-05-31 09:07:03 -07:00
Kenji Noguchi 3400f601db Protobuf extension (#4039)
* move ProtoBufInputRowParser from processing module to protobuf extensions

* Ported PR #3509

* add DynamicMessage

* fix local test stuff that slipped in

* add license header

* removed redundant type name

* removed commented code

* fix code style

* rename ProtoBuf -> Protobuf

* pom.xml: shade protobuf classes, handle .desc resource file as binary file

* clean up error messages

* pick first message type from descriptor if not specified

* fix protoMessageType null check. add test case

* move protobuf-extension from contrib to core

* document: add new configuration keys, and descriptions

* update document. add examples

* move protobuf-extension from contrib to core (2nd try)

* touch

* include protobuf extensions in the distribution

* fix whitespace

* include protobuf example in the distribution

* example: create new pb obj everytime

* document: use properly quoted json

* fix whitespace

* bump parent version to 0.10.1-SNAPSHOT

* ignore Override check

* touch
2017-05-30 13:11:58 -07:00
Kamal Gurala dcb07d6958 Option to configure default analysis types in SegmentMetadataQuery (#4259)
* Option to configure default analysis types

* Updated Docs and renamed

* Added serde tests and Null handling

* Fixed Documentation

* Updated implementation

* Updated implementation

* Updated implementation

* Added usingDefaultIntervals in Builder

* Updated implementation

* Updated implementation and added failing test

* filterSegments implementation updated

* Updated imlementation

* Padding

* Add missing Override

* Updated implementation

* Fixed a naming bug

* Fixed bug

* Removed comment
2017-05-26 12:12:39 -07:00
Gian Merlino 1eaa7887bd Fix integer overflow in BufferGrouper. (#4333)
Would have led to out of bounds buffer access with large buffers.
Also added tests using large buffers.
2017-05-25 23:30:20 -07:00
Gian Merlino 2bd4c0930f Fix "quarter" granularity serialization. (#4316) 2017-05-23 10:06:17 -07:00
Gian Merlino 9283807ad7 GroupByQuery: Fix type-spanning comparisons. (#4317)
Jackson deserializes integers sometimes as int and sometimes as long,
depending on how big they are. This leads to ClassCastException
when comparing deserialized values as part of groupBy merging on the
broker.
2017-05-23 10:06:04 -07:00
Gian Merlino 22e5f52d00 Workaround for non-thread-safe use of CardinalityAggregator. (#4304) 2017-05-23 10:33:03 +09:00
Roman Leventov 8ec3a29af0 Don't pass QueryMetrics down in concurrent and async QueryRunners (fixes #4279) (#4288)
* Don't pass QueryMetrics down in concurrent and async QueryRunners

* Rename QueryPlus.threadSafe() to withoutThreadUnsafeState(); Update QueryPlus.withQueryMetrics() Javadocs; Fix generics in MetricsEmittingQueryRunner and CpuTimeMetricQueryRunner; Make DefaultQueryMetrics to fail fast on modifications from concurrent threads
2017-05-22 13:42:09 -05:00
Maksim Logvinenko d45dad2b44 Remove boxing/unboxing in indexer (#4269)
* Remove boxing/unboxing in indexer

* Fix rowIndex visibility

* Cleanup
2017-05-17 19:13:53 -05:00
Roman Leventov d9f423f55d Make QueryMetrics factories configurable (#4268)
* Ensure QueryMetrics factories accept Json ObjectMapper; Make QueryMetrics factories configurable

* Update QueryMetrics Javadocs

* Add javadocs to QueryMetrics factories

* Move queryMetricsFactory defaults to getter methods of config classes
2017-05-17 08:41:59 -07:00
Gian Merlino ddc2e68998 Remove cache keys from HavingSpecs. (#4280)
* Remove cache keys from HavingSpecs.

They weren't used, since they aren't part of the groupBy cache key.
Also, it's good that they weren't used, since many of them had
value truncation bugs.

* Fix imports.

* Fix test.
2017-05-16 22:13:02 -07:00
Roman Leventov d400f23791 Monomorphic processing of TopN queries with simple double aggregators over historical segments (part of #3798) (#4079)
* Monomorphic processing of topN queries with simple double aggregators and historical segments

* Add CalledFromHotLoop annocations to specialized methods in SimpleDoubleBufferAggregator

* Fix a bug in Historical1SimpleDoubleAggPooledTopNScannerPrototype

* Fix a bug in SpecializationService

* In SpecializationService, emit maxSpecializations warning only once

* Make GenericIndexed.theBuffer final

* Address comments

* Newline

* Reapply 439c906 (Make GenericIndexed.theBuffer final)

* Remove extra PooledTopNAlgorithm.capabilities field

* Improve CachingIndexed.inspectRuntimeShape()

* Fix CompressedVSizeIntsIndexedSupplier.inspectRuntimeShape()

* Don't override inspectRuntimeShape() in subclasses of CompressedVSizeIndexedInts

* Annotate methods in specializations of DimensionSelector and FloatColumnSelector with @CalledFromHotLoop

* Make ValueMatcher to implement HotLoopCallee

* Doc fix

* Fix inspectRuntimeShape() impl in ExpressionSelectors

* INFO logging of specialization events

* Remove modificator

* Fix OrFilter

* Fix AndFilter

* Refactor PooledTopNAlgorithm.scanAndAggregate()

* Small refactoring

* Add 'nothing to inspect' messages in empty HotLoopCallee.inspectRuntimeShape() implementations

* Don't care about runtime shape in tests

* Fix accessor bugs in Historical1SimpleDoubleAggPooledTopNScannerPrototype and HistoricalSingleValueDimSelector1SimpleDoubleAggPooledTopNScannerPrototype, cover them with tests

* Doc wording

* Address comments

* Remove MagicAccessorBridge and ensure Offset subclasses are public

* Attach error message to element
2017-05-16 16:19:55 -07:00
Roman Leventov b7a52286e8 Make @Override annotation obligatory (#4274)
* Make MissingOverride an error

* Make travis stript to fail fast

* Add missing Override annotations

* Comment
2017-05-16 13:30:30 -05:00
Himanshu 136b2fae72 improve query timeout handling and limit max scatter-gather bytes (#4229)
* improve query timeout handling and limit max scatter-gather bytes

* address review comments
2017-05-16 12:47:32 -05:00
Benedict Jin e823085866 Improve `collection` related things that reusing a immutable object instead of creating a new object (#4135) 2017-05-17 01:38:51 +09:00
Jihoon Son 50a4ec2b0b Add support for headers and skipping thereof for CSV and TSV (#4254)
* initial commit

* small fixes

* fix bug

* fix bug

* address code review

* more cr

* more cr

* more cr

* fix

* Skip head rows for CSV and TSV

* Move checking skipHeadRows to FileIteratingFirehose

* Remove checking null iterators

* Remove unused imports

* Address comments

* Fix compilation error

* Address comments

* Add more tests

* Add a comment to ReplayableFirehose

* Addressing comments

* Add docs and fix typos
2017-05-15 22:57:31 -07:00
Roman Leventov 1ebfa22955 Update Error prone configuration; Fix bugs (#4252)
* Make Errorprone the default compiler

* Address comments

* Make Error Prone's ClassCanBeStatic rule a error

* Preconditions allow only %s pattern

* Fix DruidCoordinatorBalancerTester

* Try to give the compiler more memory

* Remove distribution module activation on jdk 1.8 because only jdk 1.8 is used now

* Don't show compiler warnings

* Try different travis script

* Fix travis.yml

* Make Error Prone optional again

* For error-prone compiler

* Increase compiler's maxmem

* Don't run Error Prone for benchmarks because of OOM

* Skip install step in Travis

* Remove MetricHolder.writeToChannel()

* In travis.yml, check compilation before tests, because it may fail faster
2017-05-12 15:55:17 +09:00
Roman Leventov e09e892477 Refactor QueryRunner to accept QueryPlus: Query + QueryMetrics (part of #3798) (#4184)
* Add QueryPlus. Add QueryRunner.run(QueryPlus, Map) method with default implementation, to replace QueryRunner.run(Query, Map).

* Fix GroupByMergingQueryRunnerV2

* Fix QueryResourceTest

* Expand the comment to Query.run(walker, context)

* Remove legacy version of BySegmentSkippingQueryRunner.doRun()

* Add LegacyApiQueryRunnerTest and be more specific about legacy API removal plans in Druid 0.11 in Javadocs
2017-05-10 12:25:00 -07:00
Himanshu 462f6482df optionally add extensions to explicitly specified hadoopContainerClassPath (#4230)
* optionally add extensions to explicitly specified hadoopContainerClassPath

* note extensions always pushed in hadoop container when druid.extensions.hadoopContainerDruidClasspath is not provided explicitly
2017-05-08 14:24:14 -05:00
Roman Leventov 8277284d67 Add Checkstyle rule to force comments to classes and methods to be Javadoc comments (#4239) 2017-05-04 11:14:41 -07:00
Roman Leventov 5e85fcc0f5 Restore BaseQuery.computeOverridenContext() for compatibility (#4241) 2017-05-02 10:22:02 -07:00
Himanshu 5a5a2749cd improvements to coordinator lookups management (#3855)
* coordinator lookups mgmt improvements

* revert replaces removal, deprecate it instead

* convert and use older specs stored in db

* more tests and updates

* review comments

* add behavior for 0.10.0 to 0.9.2 downgrade

* incorporating more review comments

* remove explicit lock and use LifecycleLock in LookupReferencesManager. use LifecycleLock in LookupCoordinatorManager as well

* wip on LookupCoordinatorManager

* lifecycle lock

* refactor thread creation into utility method

* more review comments addressed

* support smooth roll back of lookup snapshots from 0.10.0 to 0.9.2

* correctly use LifecycleLock in LookupCoordinatorManager and remove synchronization from start/stop

* run lookup mgmt on leader coordinator only

* wip: changes to do multiple start() and stop() on LookupCoordinatorManager

* lifecycleLock fix usage in LookupReferencesManagerTest

* add LifecycleLock back

* fix license hdr

* some fixes

* make LookupReferencesManager.getAllLookupsState() consistent while still being lockless

* address review comments

* addressing leventov's comments

* address charle's comments

* add IOE.java

* for safety in LookupReferencesManager mainThread check for lifecycle started state on each loop in addition to interrupt

* move thread creation utility method to Execs

* fix names

* add tests for LookupCoordinatorManager.lookupManagementLoop()

* add further tests for figuring out toBeLoaded and toBeDropped on LookupCoordinatorManager

* address leventov comments

* remove LookupsStateWithMap and parameterize LookupsState

* address review comments

* address more review comments

* misc fixes
2017-04-28 08:41:38 -05:00
Roman Leventov b9fd30e90a Add Checkstyle check to prohibit IntelliJ-style commented code lines (#4220)
* Add Checkstyle check to prohibit IntelliJ-style commented code lines

* Address comment

* Restore issue link
2017-04-27 18:11:25 -07:00
kaijianding c47cfed0ec Significantly improve LongEncodingStrategy.AUTO build performance (#4215)
* Significantly improve LongEncodingStrategy.AUTO build performance

* use numInserted instead of tempIn.available

* fix bug
2017-04-27 15:11:07 +03:00
Roman Leventov ee9b5a619a Fix bugs in query builders and in TimeBoundaryQuery.getFilter() (#4131)
* Add queryMetrics property to Query interface; Fix bugs and removed unused code in Druids

* Fix a bug in TimeBoundaryQuery.getFilter() and remove TimeBoundaryQuery.getDimensionsFilter()

* Don't reassign query's queryMetrics if already present in CPUTimeMetricQueryRunner and MetricsEmittingQueryRunner

* Add compatibility constructor to BaseQuery

* Remove Query.queryMetrics property

* Move nullToNoopLimitSpec() method to LimitSpec interface

* Rename GroupByQuery.applyLimit() to postProcess(); Fix inconsistencies in GroupByQuery.Builder
2017-04-25 16:32:02 -05:00
kaijianding 336089563d skip rows which are added after cursor created (#4049)
* fix can't get dim value via IncrementalIndexStorageAdapter cursor

* address the comment

* add ut

* address ut comments

* fix bug and fix ut
2017-04-26 03:26:46 +09:00
Jonathan Wei 723a855ab9 Fix nested groupBys with outer exfns on inner numeric columns (#4182) 2017-04-21 19:47:46 -07:00
Gian Merlino 2ca7b00346 Update versions to 0.10.1-SNAPSHOT. (#4191) 2017-04-20 18:12:28 -07:00
Gian Merlino 60caa641f3 Restore backwards compatibility of Query. (#4185) 2017-04-19 19:47:50 +03:00
Jihoon Son 5b69f2eff2 Make timeout behavior consistent to document (#4134)
* Make timeout behavior consistent to document

* Refactoring BlockingPool and add more methods to QueryContexts

* remove unused imports

* Addressed comments

* Address comments

* remove unused method

* Make default query timeout configurable

* Fix test failure

* Change timeout from period to millis
2017-04-19 09:47:53 +09:00
Gian Merlino b2954d5fea Better groupBy error messages and docs around resource limits. (#4162)
* Better groupBy error messages and docs around resource limits.

* Fix BufferGrouper test from datasketches.

* Further clarify.
2017-04-13 10:38:53 -07:00
Ram iyer 2e9589215e removing unused var (#4163) 2017-04-13 04:03:41 +09:00
kaijianding 676af79044 don't do postAgg in TimeseriesQueryQueryToolChest when not necessary (#4155)
* don't do postAgg in TimeseriesQueryQueryToolChest when not necessary

* set postAggs to empty list in TimeseriesQueryQueryToolChest instead of checking finalizing fn

* fix ut

* fix ut again
2017-04-12 15:46:46 +05:30
Roman Leventov 15f3a94474 Copy closer into Druid codebase (fixes #3652) (#4153) 2017-04-10 09:38:45 +09:00
Roman Leventov 73d9b31664 GenericIndexed minor bug fixes, optimizations and refactoring (#3951)
* Minor bug fixes in GenericIndexed; Refactor and optimize GenericIndexed; Remove some unnecessary ByteBuffer duplications in some deserialization paths; Add ZeroCopyByteArrayOutputStream

* Fixes

* Move GenericIndexedWriter.writeLongValueToOutputStream() and writeIntValueToOutputStream() to SerializerUtils

* Move constructors

* Add GenericIndexedBenchmark

* Comments

* Typo

* Note in Javadoc that IntermediateLongSupplierSerializer, LongColumnSerializer and LongMetricColumnSerializer are thread-unsafe

* Use primitive collections in IntermediateLongSupplierSerializer instead of BiMap

* Optimize TableLongEncodingWriter

* Add checks to SerializerUtils methods

* Don't restrict byte order in SerializerUtils.writeLongToOutputStream() and writeIntToOutputStream()

* Update GenericIndexedBenchmark

* SerializerUtils.writeIntToOutputStream() and writeLongToOutputStream() separate for big-endian and native-endian

* Add GenericIndexedBenchmark.indexOf()

* More checks in methods in SerializerUtils

* Use helperBuffer.arrayOffset()

* Optimizations in SerializerUtils
2017-03-27 14:17:31 -05:00
Gian Merlino dd6c0ab509 Add SQL REGEXP_EXTRACT function; add "index" to "regex" extractionFn. (#4055)
* Add SQL REGEXP_EXTRACT function; add "index" to "regex" extractionFn.

* Fix tests.
2017-03-24 17:38:36 -07:00
Erik Dubbelboer 2cbc4764f8 Comparing dimensions to each other in a filter (#3928)
Comparing dimensions to each other using a select filter
2017-03-23 18:23:46 -07:00
Roman Leventov 4b5ae31207 QueryMetrics: abstraction layer of query metrics emitting (part of #3798) (#3954)
* QueryMetrics: abstraction layer of query metrics emitting

* Minor fixes

* QueryMetrics.emit() for bulk emit and improve Javadoc

* Fixes

* Fix

* Javadoc fixes

* Typo

* Use DefaultObjectMapper

* Add tests

* Address PR comments

* Remove QueryMetrics.userDimensions(); Rename QueryMetric.register() to report()

* Dedicated TopNQueryMetricsFactory, GroupByQueryMetricsFactory and TimeseriesQueryMetricsFactory

* Typo

* More elaborate Javadoc of QueryMetrics

* Formatting

* Replace QueryMetric enum with lambdas

* Add comments and VisibleForTesting annotations
2017-03-23 17:23:59 -07:00
Jonathan Wei 79f1a1d7f0 Allow float parameters for Bound/Selector/In filters on long columns (#4074)
* Allow float parameters for long filters

* Use BigDecimal intermediate form for string->long conversions

* PR comments

* PR comments
2017-03-23 14:18:05 -07:00
Akash Dwivedi ff7f90b02d relocate method in BufferAggregator. (#4071)
*  relocate method in BufferAggregator.

* Unused import.

* Detailed javadoc.

* using Int2ObjectMap.

* batch relocate.

* Revert batch relocate.

* Unused import.

* code comments.

* code comment.
2017-03-23 13:07:59 -07:00
David Lim f68ba4128f Exclude pagingIdentifiers that don't apply to a datasource (#4078)
* exclude pagingIdentifiers that don't apply to a datasource to support union datasources

* code review changes

* code review changes
2017-03-22 12:32:27 -07:00
Gian Merlino 1f48198607 Fix some query cache key collisions. (#4094)
The query caches generally store dimensions and aggregators positionally, so
appendCacheablesIgnoringOrder could lead to incorrect results being pulled
from the cache.
2017-03-22 11:08:48 -07:00
Gian Merlino 77b6213222 Remove unused Filters.getLongValueMatcher method. (#4086) 2017-03-21 13:46:07 -06:00
Gian Merlino ad477cb454 Fix topNs with extractionFns but no aggregators. (#4070)
The result sets were empty because of an aggs.length > 0 check. I'm not
sure if it was there for any good reason, but there didn't seem to be one.
2017-03-20 11:31:30 -07:00
Roman Leventov 84fe91ba0b Monomorphic processing of TopN queries with 1 and 2 aggregators (key part of #3798) (#3889)
* Monomorphic processing: add HotLoopCallee, CalledFromHotLoop, RuntimeShapeInspector, SpecializationService. Specialize topN queries with 1 or 2 aggregators. Add Cursor.advanceUninterruptibly() and isDoneOrInterrupted() for exception-free query processing.

* Use Execs.singleThreaded()

* RuntimeShapeInspector to support nullable fields

* Make CalledFromHotLoop annotation Inherited

* Remove unnecessary conversion of array of ColumnSelectorPluses to list and back to array in CardinalityAggregatorFactory

* Close InputStream in SpecializationService

* Formatting

* Test specialized PooledTopNScanners

* Set flags in PooledTopNAlgorithm directly

* Fix tests, dependent on CountAggragatorFactory toString() form

* Fix

* Revert CountAggregatorFactory changes

* Implement inspectRuntimeShape() for LongWrappingDimensionSelector and FloatWrappingDimensionSelector

* Remove duplicate RoaringBitmap dependency in the extendedset pom.xml

* Fix

* Treat ByteBuffers specially in StringRuntimeShape

* Doc fix

* Annotate BufferAggregator.init() with CalledFromHotLoop

* Make triggerSpecializationIterationsThreshold an int

* Remove SpecializationService.PerPrototypeClassState.of()

* Add comments

* Limit the amount of specializations that SpecializationService could make

* Add default implementation for BufferAggregator.inspectRuntimeShape(), for compatibility with extensions

* Use more efficient ConcurrentMap's idioms in SpecializationService
2017-03-17 14:44:36 -05:00
Gian Merlino 3ec1877887 Fix BucketExtractionFn on objects that are strings. (#4072) 2017-03-16 22:59:11 -07:00
Charles Allen 805d85afda Allow compilation as Java8 source and target (#3328)
* Allow compilation as Java8 source and target for everything except API

* Remove conditions in tests which assume that we may run with Java 7

* Update easymock to 3.4

* Make Animal Sniffer to check Java 1.8 usage; remove redundant druid-caffeine-cache configuration

* Use try-with-resources in LargeColumnSupportedComplexColumnSerializerTest.testSanity()

* Remove java7 special for druid-api
2017-03-14 22:23:47 -06:00
Gian Merlino e5c0dab12c groupBy v2: Better error message when resources are exhausted. (#4046)
* groupBy v2: Better error message when resources are exhausted.

Fixes #4043.

* Fix tests.
2017-03-15 00:37:49 +05:30
Jihoon Son dfe4bda7fd add doc (#4030) 2017-03-10 12:49:20 -08:00
Gian Merlino a5170666b6 groupBy v2: Always merge queries. (#4023)
This fixes #4020 because it means the timestamp will always be included for outermost
queries. Historicals receiving queries from older brokers will think they're
outermost (because CTX_KEY_OUTERMOST isn't set to "false"), so they'll include a
timestamp, so the older brokers will be OK.
2017-03-08 12:47:46 -06:00
Gian Merlino 4ca5270e88 Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods. (#4004)
* Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods.

Includes two fixes:
- groupBy v2 now ignores chunkPeriod, since it wouldn't have helped anyway (its mergeResults
returns a lazy sequence) and it generates incorrect results.
- Fix chunkPeriod handling for periods of irregular length, like "P1M" or "P1Y".

Also includes doc and test fixes:
- groupBy v1 was no longer being tested by GroupByQueryRunnerTest since #3953, now it
  is once again.
- chunkPeriod documentation was misleading due to its checkered past. Updated it to
  be more accurate.

* Remove unused import.

* Restore buffer size.
2017-03-06 12:27:02 -06:00
Gian Merlino 7b9e6c29cd Fix float, long dimension indexer object selectors. (#4012)
Their "convertUnsortedEncodedKeyComponentToActualArrayOrList" methods didn't respect the contract,
which says they should return single values (not array/list) if there is only a single value
to return. This affects the behavior of ObjectColumnSelectors on realtime segments.
2017-03-06 10:01:30 -08:00
Gian Merlino 337f3870d8 Fix TimeFormatExtractionFn getCacheKey when tz, locale are not provided. (#4007)
* Fix TimeFormatExtractionFn getCacheKey when tz, locale are not provided.

* Remove unused import.

* Use defaults in cache key.
2017-03-04 17:41:59 -08:00
praveev 67d0ae3271 Let toDateTime call fall through for Duration Granularity (#4001)
* Let toDateTime call fall through for Duration Granularity

Added test for the same.

* Add duration granularity test to GroupByQueryRunnerTest
2017-03-03 13:27:22 -06:00
Himanshu e7e3c2dc5a support singleThreaded flag for groupBy-v2 as well (#3992) 2017-03-03 23:43:06 +05:30
Roman Leventov 81a5f9851f TmpFileIOPeons to create files under the merging output directory, instead of java.io.tmpdir (#3990)
* In IndexMerger and IndexMergerV9, create temporary files under the output directory/tmpPeonFiles, instead of java.io.tmpdir

* Use FileUtils.forceMkdir() across the codebase and remove some unused code

* Fix test

* Fix PullDependencies.run()

* Unused import
2017-03-02 14:05:12 -08:00
Jonathan Wei 5fb1638534 Add default configuration for select query 'fromNext' parameter (#3986)
* Add default configuration for select query 'fromNext' parameter

* PR comments

* Fix PagingSpec config injection

* Injection fix for test
2017-03-01 17:05:35 -08:00
Himanshu 8316b4f48f fix TimeDimExtractionFn.apply() under concurrency (#3984) 2017-03-01 13:07:12 -08:00
kaijianding 772de66e79 add filenameBase to log when exceed file size limit to indicate which column it is (#3982) 2017-03-01 13:05:07 -08:00
Gian Merlino cc20133e70 Checkstyle rule to outlaw tabs. (#3988)
Tabs are the worst.
2017-02-28 23:52:53 -08:00
Akash Dwivedi 91344cbe57 Enable GenericIndexed V2 for built-in(druid-io managed) complex columns. (#3987)
* Enable GenericIndexed V2 for complex columns.

* SerializerBuilder to use  GenericColumnSerializer.
2017-02-28 22:06:54 -08:00
Jonathan Wei a08660a9ca Support ingestion of long/float dimensions (#3966)
* Support ingestion for long/float dimensions

* Allow non-arrays for key components in indexing type strategy interfaces

* Add numeric index merge test, fixes

* Docs for numeric dims at ingestion

* Remove unused import

* Adjust docs, add aggregate on numeric dims tests

* remove unused imports

* Throw exception for bitmap method on numerics

* Move typed selector creation to DimensionIndexer interface

* unused imports

* Fix

* Remove unused DimensionSpec from indexer methods, check for dims first in inc index storage adapter

* Remove spaces
2017-02-28 19:04:41 -08:00
praveev 5ccfdcc48b Fix testDeadlock timeout delay (#3979)
* No more singleton. Reduce iterations

* Granularities

* Fix the delay in the test

* Add license header

* Remove unused imports

* Lot more unused imports from all the rearranging

* CR feedback

* Move javadoc to constructor
2017-02-28 12:51:41 -06:00
praveev c3bf40108d One granularity (#3850)
* Refactor Segment Granularity

* Beginning of one granularity

* Copy the fix for custom periods in segment-grunalrity over here.

* Remove the custom serialization for now.

* Compilation cleanup

* Reformat code

* Fixing unit tests

* Unify to use a single iterable

* Backward compatibility for rolling upgrade

* Minor check style. Cosmetic changes.

* Rename length and millis to duration

* CR feedback

* Minor changes.
2017-02-25 01:02:29 -06:00
Jonathan Wei 58b704c3b4 Don't allow '__time' as a GroupBy output field name (#3967)
* Don't allow '__time' as a GroupBy column field name

* Tweak exception message
2017-02-23 14:39:17 -08:00
kaijianding 7ce05d58bc fix NPE in search query when dimension contains null value (#3968)
* fix NPE when dimension contains null value in search query

* add ut

* search with not existed dimension should always return empty result
2017-02-23 08:07:59 -08:00
Gian Merlino 372b84991c Add virtual columns to timeseries, topN, and groupBy. (#3941)
* Add virtual columns to timeseries, topN, and groupBy.

* Fix GroupByTimeseriesQueryRunnerTest.

* Updates from review comments.
2017-02-22 13:16:48 -08:00
Jihoon Son 7200dce112 Atomic merge buffer acquisition for groupBys (#3939)
* Atomic merge buffer acquisition for groupBys

* documentation

* documentation

* address comments

* address comments

* fix test failure

* Addressed comments

- Add InsufficientResourcesException
- Renamed GroupByQueryBrokerResource to GroupByQueryResource

* addressed comments

* Add takeBatch() to BlockingPool
2017-02-22 14:49:37 -06:00
Gian Merlino 985203b634 Finalize fields in postaggs (#3957)
* initial commits for finalizeFieldAccess #2433

* fix some bugs to run a query

* change name of method Queries.verifyAggregations to Queries.prepareAggregations

* add Uts

* fix Ut failures

* rebased to master

* address comments and add a Ut for arithmetic post aggregators

* rebased to the master

* address the comment of injection within arithmetic post aggregator

* address comments and introduce decorate() in the PostAggregator interface.

* Address comments. 1. Implements getComparator in FinalizingFieldAccessPostAggregator and add Uts for it 2. Some minor changes like renaming a method name.

* Fix a code style mismatch.

* Rebased to the master
2017-02-21 16:32:14 -08:00
Gian Merlino a47206eaf8 Ability to filter on virtual columns. (#3942)
This didn't need much other than having BitmapIndexSelector return null from
various methods to trigger cursor based filtering.
2017-02-21 16:03:31 -08:00
Jihoon Son 128274c6f0 Disable caching on brokers for groupBy v2 (#3950)
* Disable caching on brokers for groupBy v2

* Rename parameter

* address comments
2017-02-21 09:49:49 -08:00
Jonathan Wei bc33b68b51 Use GroupBy V2 as default (#3953)
* Use GroupBy V2 as default

* Remove unused line

* Change assert to exception propagation
2017-02-18 07:40:40 -08:00
kaijianding 361d9d9802 fix dynamic schema data can't rollup correctly (#3949)
* fix dynamic schema data can't rollup correctly

* add ut
2017-02-17 15:07:29 -06:00
Akash Dwivedi 797488a677 Removing Integer.MAX column size limit. (#3743)
* Removing Integer.MAX column size limit.

* On demand creation of headerLong, use v2 instead of v3

* Avoid reusing the same object from a previous test.

* Avoid reusing the same object from a previous test part#2

* code formatting.

* GenericIndexed/Writer code review changes.

* GenericIndexed/writer code review requested changes.

* checkIndex() to static

* native endianess for genericIndexedV2, code  review requested changes.

* Formatting

* Hll fix.

* use native endianess during bag size calculation.

* Code review requested changes.

* IOPeon close() changes.

* use different tmp directory path for testing.

* Code review requested changes.
2017-02-16 20:09:43 -06:00
Jihoon Son a459db68b6 Fine grained buffer management for groupby (#3863)
* Fine-grained buffer management for group by queries

* Remove maxQueryCount from GroupByRules

* Fix code style

* Merge master

* Fix compilation failure

* Address comments

* Address comments

- Revert Sequence
- Add isInitialized() to Grouper
- Initialize the grouper in RowBasedGrouperHelper.Accumulator
- Simple refactoring RowBasedGrouperHelper.Accumulator
- Add tests for checking the number of used merge buffers
- Improve docs

* Revert unnecessary changes

* change to visible to testing

* fix misspelling
2017-02-14 12:55:54 -08:00
Gian Merlino af67e8904e PreComputedHyperUniquesSerde: Fix formatting. (#3932) 2017-02-14 09:32:29 -08:00
DaimonPl a2875a4d91 pre-computed HLL support for hyperUnique aggregator (#3909) 2017-02-13 15:26:20 -08:00
Akash Dwivedi 8854ce018e File.deleteOnExit() (#3923)
* Less use of File.deleteOnExit()
 * removed deleteOnExit from most of the tests/benchmarks/iopeon
 * Made IOpeon closable

* Formatting.

* Revert DeterminePartitionsJobTest, remove cleanup method from IOPeon
2017-02-13 15:12:14 -08:00
Himanshu 9dfcf0763a disable javascript execution by default (#3818) 2017-02-13 15:11:18 -08:00
Pierre 9ab9feced6 Close all aggregators when closing onHeapIncrementalIndex (#3926)
* Close all aggregators when closing onHeapIncrementalIndex

* Aggregators are now handled as Closeables, remove unnecessary mock in test

* Fix variable shadowing
2017-02-13 15:01:27 -08:00
Jihoon Son 991e2852da Add PostAggregators to generator cache keys for top-n queries (#3899)
* Add PostAggregators to generator cache keys for top-n queries

* Add tests for strings

* Remove debug comments

* Add type keys and list sizes to cache key

* Make post aggregators used for sort are considered for cache key generation

* Use assertArrayEquals()

* Improve findPostAggregatorsForSort()

* Address comments

* fix test failure

* address comments
2017-02-13 12:23:44 -08:00
Parag Jain 33c635aff2 use as() method of base segment in reference counting segment (#3921) 2017-02-09 20:24:47 -06:00
Jonathan Wei ca2b04f0fd Add long/float ColumnSelectorStrategy implementations (#3838)
* Add long/float ColumnSelectorStrategy implementations

* Address PR comments

* Add String strategy with internal dictionary to V2 groupby, remove dict from numeric wrapping selectors, more tests

* PR comments

* Use BaseSingleValueDimensionSelector for long/float wrapping

* remove unused import

* Address PR comments

* PR comments

* PR comments

* More PR comments

* Fix failing calcite histogram subquery tests

* ScanQuery test and comment about isInputRaw

* Add outputType to extractionDimensionSpec, tweak SQL tests

* Fix limit spec optimization for numerics

* Add cardinality sanity checks to TopN

* Fix import from merge

* Add tests for filtered dimension spec outputType

* Address PR comments

* Allow filtered dimspecs on numerics

* More comments
2017-02-08 20:39:29 -08:00
Gian Merlino 97765fdfef Simplify LikeFilter implementation of getBitmapIndex, estimateSelectivity. (#3910)
* Simplify LikeFilter implementation of getBitmapIndex, estimateSelectivity.

LikeFilter:
- Reduce code duplication, and simplify methods, at the cost of incurring an extra box
  of ImmutableBitmap into a SingletonImmutableList. I think this is fine, since this
  should be cheap and the code path is not hot (just once per filter).

Filters:
- Make estimateSelectivity public since it seems intended that they be used by Filter
  implementations, and Filters from extensions may want to use them too. Removed
  @VisibleForTesting for the same reason.
- Rename one of the estimatePredicateSelectivity overloads to estimateSelectivity, since
  predicates aren't involved.

* Address PR comments.

* Remove unused import

* Change List to Collection
2017-02-08 13:46:01 -06:00
Gian Merlino 12317fd001 Bump version to 0.10.0-SNAPSHOT. (#3913) 2017-02-06 17:54:35 -08:00
Jihoon Son ddd8c9ef97 Add filter selectivity estimation for auto search strategy (#3848)
* Add filter selectivity estimation for auto search strategy

* Addressed comments

* Lazy bitmap materialization for bitmap sampling and java docs

* Addressed comments.

- Fix wrong non-overlap ratio computation and added unit tests.
- Change Iterable<Integer> to IntIterable
- Remove unnecessary Iterable<Integer>

* Addressed comments

- Split a long ternary operation into if-else blocks
- Add IntListUtils.fromTo()

* Fix test failure and add a test for RangeIntList

* fix code style

* Diabled selectivity estimation for multi-valued dimensions

* Address comment
2017-02-06 11:15:03 -08:00
Parag Jain 8a13a85765 Introduce SegmentizerFactory (#3901)
* Introduce SegmentizerFactory
- that knows how to deserialize specific type of segment
- Default implementation is MMappedQueryableSegmentizerFactory which creates QueryableIndexSegment
- Unit test for the default behavior

* review comments
2017-02-06 10:05:12 -08:00
DaimonPl 93b71e265e Extract HLL related code to separate module (#3900) 2017-02-03 09:45:11 -08:00
Jonathan Wei 182261f713 Allow configurable temp directory for query processing (#3893) 2017-02-02 10:22:28 -08:00
Jonathan Wei e6b95e80aa Remove deprecated Aggregator/AggregatorFactory methods (#3894) 2017-02-01 14:43:18 -08:00
Gian Merlino d3a3b7ba0c Add virtual column types, holder serde, and safety features. (#3823)
* Add virtual column types, holder serde, and safety features.

Virtual columns:
- add long, float, dimension selectors
- put cache IDs in VirtualColumnCacheHelper
- adjust serde so VirtualColumns can be the holder object for Jackson
- add fail-fast validation for cycle detection and duplicates
- add expression virtual column in core

Storage adapters:
- move virtual column hooks before checking base columns, to prevent surprises
  when a new base column is added that happens to have the same name as a
  virtual column.

* Fix ExtractionDimensionSpecs with virtual dimensions.

* Fix unused imports.

* CR comments

* Merge one more time, with feeling.
2017-01-26 18:15:51 -08:00
Roman Leventov 75d9e5e7a7 DimensionSelector-related bug fixes and optimizations (fixes #3799, part of #3798) (#3858)
*  * Add DimensionSelector.idLookup() and nameLookupPossibleInAdvance() to allow better inspection of features DimensionSelectors supports, and safer code working with DimensionSelectors in BaseTopNAlgorithm, BaseFilteredDimensionSpec, DimensionSelectorUtils;
 * Add PredicateFilteringDimensionSelector, to make BaseFilteredDimensionSpec to be able to decorate DimensionSelectors with unknown cardinality;
 * Add DimensionSelector.makeValueMatcher() (two kinds) for DimensionSelector-side specifics-aware optimization of ValueMatchers;
 * Optimize getRow() in BaseFilteredDimensionSpec's DimensionSelector, StringDimensionIndexer's DimensionSelector and SingleScanTimeDimSelector;
 * Use two static singletons, TrueValueMatcher and FalseValueMatcher, instead of BooleanValueMatcher;
 * Add NullStringObjectColumnSelector singleton and use it in MapVirtualColumn

* Rename DimensionSelectorUtils.makeNonDictionaryEncodedIndexedIntsBasedValueMatcher to makeNonDictionaryEncodedRowBasedValueMatcher

* Make ArrayBasedIndexedInts constructor private, replace it's usages with of() static factory method

* Cache baseIdLookup in ForwardingFilteredDimensionSelector

* Fix a bug in DimensionSelectorUtils.makeRowBasedValueMatcher(selector, predicate, matchNull)

* Employ precomputed BitSet optimization in DimensionSelector.makeValueMatcher(value, matchNull) when lookupId() is not available, but cardinality is known and lookupName() is available

* Doc fixes

* Addressed comments

* Fix

* Fix

* Adjust javadoc of DimensionSelector.nameLookupPossibleInAdvance() for SingleScanTimeDimSelector

* throw UnsupportedOperationException instead of IAE in BaseTopNAlgorithm
2017-01-25 15:28:27 -08:00
Gian Merlino 3136dfa421 LikeFilter: Read value lazily when doing a prefix-based match. (#3880)
This speeds up cases where we don't actually need to read the value,
such as "LIKE 'foo%'".
2017-01-25 13:22:07 -08:00
Roman Leventov af93a8d189 Sequences refactorings and removed unused code (part of #3798) (#3693)
* Removing unused code from io.druid.java.util.common.guava package; fix #3563 (more consistent and paranoiac resource handing in Sequences subsystem); Add Sequences.wrap() for DRY in MetricsEmittingQueryRunner, CPUTimeMetricQueryRunner and SpecificSegmentQueryRunner; Catch MissingSegmentsException in SpecificSegmentQueryRunner's yielder.next() method (follow up on #3617)

* Make Sequences.withEffect() execute the effect if the wrapped sequence throws exception from close()

* Fix strange code in MetricsEmittingQueryRunner

* Add comment on why YieldingSequenceBase is used in Sequences.withEffect()

* Use Closer in OrderedMergeSequence and MergeSequence to close multiple yielders
2017-01-19 20:07:43 -08:00
kaijianding 33ae9dd485 streaming version of select query (#3307)
* streaming version of select query

* use columns instead of dimensions and metrics;prepare for valueVector;remove granularity

* respect query limit within historical

* use constant

* fix thread name corrupted bug when using jetty qtp thread rather than processing thread while working with SpecificSegmentQueryRunner

* add some test for scan query

* add scan query document

* fix merge conflicts

* add compactedList resultFormat, this format is better for json ser/der

* respect query timeout

* respect query limit on broker

* use static consts and remove unused code
2017-01-19 16:09:53 -06:00
Slim 558dc365a4 renaming classes to be run by mvn and comment non operational tests (#3847) 2017-01-17 11:59:12 -08:00
Akash Dwivedi dd0c4e2ead Migrating extendedset from Metamarkets. (#3694)
* Migrating extendedset from Metamarkets.

* Notice change

* More details in NOTICE

* NOTICE formatting.

* suppress header checkstlye for extendedset.
2017-01-17 10:10:27 -08:00
Gian Merlino e86859b228 SQL support for nested groupBys. (#3806)
* SQL support for nested groupBys.

Allows, for example, doing exact count distinct by writing:

  SELECT COUNT(*) FROM (SELECT DISTINCT col FROM druid.foo)

Contrast with approximate count distinct, which is:

  SELECT COUNT(DISTINCT col) FROM druid.foo

* Add deeply-nested groupBy docs, tests, and maxQueryCount config.

* Extract magic constants into statics.

* Rework rules to put preconditions in the "matches" method.
2017-01-11 18:32:53 -08:00
Jihoon Son d80bec83cc Enable auto license checking (#3836)
* Enable license checking

* Clean duplicated license headers
2017-01-10 18:13:47 -08:00
Jihoon Son c099977a5b Add an option to SearchQuery to choose a search query execution strategy (#3792)
* Add an option to SearchQuery to choose a search query execution strategy.

Supported strategies are
1) Index-only query execution
2) Cursor-based scan
3) Auto: choose an efficient strategy for a given query

* Add SearchStrategy and SearchQueryExecutor

* Address comments

* Rename strategies and set UseIndexesStrategy as the default strategy

* Add a cost-based planner for auto strategy

* Add document

* Fix code style

* apply code style

* apply comments
2017-01-10 18:04:20 -08:00
Gian Merlino 3c63cff57a Remove makeMathExpressionSelector from ColumnSelectorFactory. (#3815)
* Remove makeMathExpressionSelector from ColumnSelectorFactory.

* Add @Nullable annotations in places, fix Number.class check.

* Break up createBindings, add tests.

* Add null check.
2017-01-05 18:06:38 -08:00
Gian Merlino 220ca7ebb6 Ignore DimFilterHavingSpec testConcurrentUsage. (#3814) 2017-01-03 17:43:58 -07:00
Gian Merlino d8702ebece Filters: Use ColumnSelectorFactory directly for building row-based matchers. (#3797)
* Filters: Use ColumnSelectorFactory directly for building row-based matchers.

* Adjustments based on code review.

- BoundDimFilter: fewer volatiles, rename matchesAnything to !matchesNothing.
- HavingSpecs: Clarify that they are not thread-safe, and make DimFilterHavingSpec
  not thread safe.
- Renamed rowType to rowSignature.
- Added specializations for time-based vs non-time-based DimensionSelector in RBCSF.
- Added convenience method DimensionHanderUtils.createColumnSelectorPlus.
- Added singleton ZeroIndexedInts.
- Added test cases for DimFilterHavingSpec.

* Make ValueMatcherColumnSelectorStrategy actually use the associated selector.

* Add RangeIndexedInts.

* DimFilterHavingSpec: Fix concurrent usage guard on jdk7.

* Add assertion to ZeroIndexedInts.

* Rename no-longer-volatile members.
2017-01-03 14:30:22 -08:00
Roman Leventov 33800122ad Don't return leaked Objects back to StupidPool, because this is dangerous. Reuse Cleaners in StupidPool. Make StupidPools named. Add StupidPool.leakedObjectCount(). Minor fixes (#3631) 2016-12-26 00:35:35 -06:00
Jonathan Wei 0e5bd8b4d4 Add dimension type-based interface for query processing (#3570)
* Add dimension type-based interface for query processing

* PR comment changes

* Address PR comments

* Use getters for QueryDimensionInfo

* Split DimensionQueryHelper into base interface and query-specific interfaces

* Treat empty rows as nulls in v2 groupby

* Reduce boxing in SearchQueryRunner

* Add GroupBy empty row handling to MultiValuedDimensionTest

* Address PR comments

* PR comments and refactoring

* More PR comments

* PR comments
2016-12-21 20:11:37 -07:00
Jonathan Wei 2bfcc8a592 First and Last Aggregator (#3566)
* add first and last aggregator

* add test and fix

* moving around

* separate aggregator valueType

* address PR comment

* add finalize inner query and adjust v1 inner indexing

* better test and fixes

* java-util import fixes

* PR comments

* Add first/last aggs to ITWikipediaQueryTest
2016-12-16 15:26:40 -08:00
Himanshu ed322a4beb remove size from default analysisTypes list for segmentMetadata query (#3773) 2016-12-13 18:01:21 -08:00
Jonathan Wei 880a021a7a Fix missed travis failures from PR 3567 and 2798 (#3761)
* Fix checkstyle failures from PR 3567

* Fix GranularityPathSpecTest compile failure
2016-12-07 19:07:31 -08:00
Erik Dubbelboer bb9e35e1af Add Greatest and Least post aggregations (#3567) 2016-12-07 17:58:23 -08:00
Roman Leventov dc8f814acc Optimize Iterator<ImmutableBitmap> implementation inside Filters.matchPredicate() so that it doesn't emit empty bitmap in the end of the iteration, and make it to follow Iterator contract, that is throw NoSuchElementException from next() if there are no more bitmaps (#3754) 2016-12-07 12:54:09 -08:00
Jonathan Wei d1896a2d62 Disable flush after every ObjectMapper write (#3748) 2016-12-06 16:45:23 -08:00
Gian Merlino b1bac9f2d3 groupBy v2: Ignore timestamp completely when granularity = all, except for the final merge. (#3740)
* GroupByBenchmark: Add serde, spilling, all-gran benchmarks.

Also use more iterations.

* groupBy v2: Ignore timestamp completely when granularity = all, except for the final merge.

Specifically:

- Remove timestamp from RowBasedKey when not needed
- Set timestamp to null in MapBasedRows that are not part of the final merge.
2016-12-06 16:17:32 -08:00
Himanshu 45da7e48f1 groupBy sort results by (dimensions,timestamp) instead of (timestamp,dimension) (#3672)
* sortByDimsFirst flag for groupBy query

* Remove need for KeyType in Grouper<KeyType> to be Comparable<KeyType>

* fix review comments

* fix review comments regarding removing code duplication of dim/time comparison

* move comparator for KeyType object to KeySerdeFactory so that creation of comparator does not need KeySerde

* remove unnecessary system.out.println

* make access static var NATURAL_NULLS_FIRST directly

* further review comments addressing
2016-12-06 09:48:56 -08:00
Navis Ryu c74d267f50 Support virtual column for select query (#2511)
* Support virtual column for select query

* Addressed comments
2016-12-05 15:14:35 -08:00
Gian Merlino b64e06704e Fix SingleScanTimeDimSelector when an extractionFn returns null for a timestamp. (#3732) 2016-12-02 15:27:54 -08:00
Gian Merlino f4cc8c2b2f IndexBuilder: Close IncrementalIndex when done. (#3734) 2016-12-02 16:56:34 -06:00
Gian Merlino 353fee79dd Add "asMillis" option to "timeFormat" extractionFn. (#3733)
This is useful for chaining extractionFns that all want to treat time as millis,
such as having a javascript extractionFn after a timeFormat.
2016-12-02 13:45:16 -08:00
Gian Merlino 102375d9bb Add "strlen" extractionFn. (#3731) 2016-12-02 12:08:51 -08:00
Gian Merlino 4c5d10f8a3 Add DimFilterHavingSpec. (#3727)
* Add DimFilterHavingSpec.

* Add test for DimFilterHavingSpec with extractionFns.
2016-12-02 10:04:30 -08:00
Gian Merlino 68735829ca Add, fix equals, hashCode, toString on various classes. (#3723)
* TimeFormatExtractionFn: Add toString.

* InDimFilter: Add toString, allow accepting any Collection of values.

* DimensionTopNMetricSpec: Fix toString.

* InvertedTopNMetricSpec: Add toString.

* HyperUniqueFinalizingPostAggregator: Add equals, hashCode, toString.
2016-11-30 19:00:14 -08:00
Gian Merlino 477e0cab7c Filter fixes and tests (#3724)
* More robust Filter tests.

All Filter tests now exercise the CNF and post-filtering features.

* Fixes to RowBasedValueMatcherFactory and to bound filters.

- Change Comparables to Strings in ValueMatcher related code.
- Break out RowBasedValueMatcherFactory, fix a variety of issues around nulls, and add tests.
- Fix bound filters on long columns with non-numeric bounds, and add tests.
2016-11-30 16:10:05 -08:00
Gian Merlino 6922d684bf GroupBy: Validation of output names, and a gross hack for v1 subqueries. (#3686)
v1 subqueries try to use aggregators to "transfer" values from the inner
results to an incremental index, but aggregators can't transfer all kinds of
values (strings are a common one). This is a workaround that selectively
ignores what the outer aggregators ask for and instead assumes that we know
best.

These are in the same commit because the name validation changed the kinds of
errors that were thrown by v1 subqueries.
2016-11-29 12:35:03 +05:30
Roman Leventov c070b4a816 Fix concurrency defects, remove unnecessary volatiles (#3701) 2016-11-22 16:42:28 -08:00
Roman Leventov 7b56cec3b9 Fix resource leaks (#3702) 2016-11-18 21:21:36 +05:30
Gian Merlino 7e80d1045a Exercise v2 engine in the groupBy aggregator and multi-value dimension tests. (#3698)
This also involved some other test changes:

- Added a factory.mergeRunners step to AggregationTestHelper's groupBy chain, since the v2
  engine does merging there.
- Changed test byteBuffer pools from on-heap to off-heap to work around
  https://github.com/DataSketches/sketches-core/pull/116 for datasketches tests.
2016-11-16 20:02:25 -08:00
Keuntae Park 094f5b851b Support Min/Max for Timestamp (#3299)
* Min/Max aggregator for Timestamp

* remove unused imports and method

* rebase and zip the test data

* add docs
2016-11-14 23:00:21 -08:00
Gian Merlino 9ad34a3f03 groupBy v1: Force all dimensions to strings. (#3685)
Fixes #3683.
2016-11-14 09:30:18 -08:00
Jisoo Kim 7c0f462fbc fix bug in StringDimensionHandler and add a cli tool for validating segments (#3666) 2016-11-11 18:46:25 -08:00
Roman Leventov fbbb55f867 Update emitter dependency to 0.4.0 and emit "version" dimension for all druid metrics (#3679)
* Update emitter dependency to 0.4.0 and emit "version" dimension for all druid metrics, not only query metrics

* Remove unused imports

* Use empty string instead of "testing-version" as a version placeholder
2016-11-11 17:17:27 -06:00
Akash Dwivedi 3e408497b3 Migrating bytebuffercollections from Metamarkets. (#3647)
* Migrating  bytebuffercollections from Metamarkets.

* resolving code conflicts and removing <p> from bytebuffer-collections.
2016-11-11 10:51:07 -08:00
Gian Merlino fd5451486c Short-circuiting AndFilter. (#3676)
If any of the bitmaps are empty, the result will be false.
2016-11-11 10:14:56 -08:00
Gian Merlino 657e4512d2 Checkstyle checks for AvoidStaticImport, UnusedImports. (#3660)
Excludes tests from AvoidStaticImport, since those are used often there and
I didn't want to make this changeset too large. Production code use was minimal
and I switched those to non-static imports.
2016-11-05 11:34:36 -07:00
Gian Merlino 4cbebd0931 SubstringDimExtractionFn, BoundDimFilter: Implement typical style toString. (#3658) 2016-11-04 13:31:47 -07:00
Gian Merlino 600bbd4a17 BucketExtractionFn: Implement hashCode, fix toString. (#3656) 2016-11-04 11:24:02 -07:00
Gian Merlino 8b3c86f41f Fix FilteredAggregatorFactory toString formatting. (#3657) 2016-11-04 11:23:55 -07:00
Gian Merlino 2c504b6258 Add "like" filter. (#3642)
* Add "like" filter.

* Addressed some PR comments.

* Slight simplifications to LikeFilter.

* Additional simplifications.

* Fix comment in LikeFilter.

* Clarify comment in LikeFilter.

* Simplify LikeMatcher a bit.

* No use going through the optimized path if prefix is empty.

* Add more tests.
2016-11-04 23:25:03 +05:30
Navis Ryu b99e14e732 Support configuration for handling multi-valued dimension (#2541)
* Support configuration for handling multi-valued dimension

* Addressed comments

* use MultiValueHandling.ofDefault() for missing policy
2016-11-03 22:38:54 -06:00
Navis Ryu e10def32f2 Support string type in math expression (#2836)
* Support string type in math expression

addressed comments

addressed comments

Addressed comments

* Updated math function document

* Addressed comments
2016-11-02 21:10:48 -06:00
kaijianding 2961406b90 fix zero period in PeriodGranularity causing gran.iterable(start, end) infinite loop (#3644) 2016-11-02 15:40:07 +05:30
Roman Leventov 4b0d6cf789 Fix resource leaks (ComplexColumn and GenericColumn) (#3629)
* Remove unused ComplexColumnImpl class

* Remove throws IOException from close() in GenericColumn, ComplexColumn, IndexedFloats and IndexedLongs

* Use concise try-with-resources syntax in several places

* Fix resource leaks (ComplexColumn and GenericColumn) in SegmentAnalyzer, SearchQueryRunner, QueryableIndexIndexableAdapter and QueryableIndexStorageAdapter

* Use Closer in Iterable, returned from QueryableIndexIndexableAdapter.getRows(), in order to try to close everything even if closing some parts thew exceptions
2016-11-02 09:23:52 +05:30
Gian Merlino 45940d6e40 Math expressions support for missing columns. (#3630)
Also add SchemaEvolutionTest to help test this kind of thing.

Fixes #3627 and includes test for #3625.
2016-11-01 09:40:25 -07:00
Gian Merlino 89d9c61894 Deprecate Aggregator.getName and AggregatorFactory.getAggregatorStartValue. (#3572) 2016-10-31 15:24:30 -07:00
Navis Ryu 3fca3be9ea SpecificSegmentQueryRunner misses missing segments from toYielder() (#3617) 2016-10-30 11:47:29 -07:00
Himanshu 23a8e22836 fix SketchMergeAggregatorFactory.finalizeResults, comparator and more UTs for timeseries, topN (#3613) 2016-10-28 15:48:33 -07:00
Navis Ryu 898c1c21af More best-effort parse long (#3603)
* More best-effort parse long

* addressed comments
2016-10-25 10:31:51 -07:00
Akash Dwivedi 4b3bd8bd63 Migrating java-util from Metamarkets. (#3585)
* Migrating java-util from Metamarkets.

* checkstyle and updated license on java-util files.

* Removed unused imports from whole project.

* cherry pick metamx/java-util@826021f.

* Copyright changes on java-util pom, address review comments.
2016-10-21 14:57:07 -07:00
Navis Ryu 8b7ff4409a Math expressional parameters for aggregator (#2783)
* Supports expression-paramed aggregator (squashed and rebased on master) also includes math post aggregator (was #2820)

* Addressed comments

* addressed comments
2016-10-19 13:58:35 -05:00
Roman Leventov b113a34355 In CPUTimeMetricQueryRunner, account CPU consumed in baseSequence.toYielder() (#3587) 2016-10-18 09:06:42 -05:00
Charles Allen 2c5c8198db Make query/cpu/time still report on error (#3535) 2016-10-18 08:26:21 -05:00
Roman Leventov 9611358f0a Small topn scan improvements (#3526)
* Remove unused numProcessed param from PooledTopNAlgorithm.aggregateDimValue()

* Replace AtomicInteger with simple int in PooledTopNAlgorithm.scanAndAggregate() and aggregateDimValue()

* Remove unused import
2016-10-17 10:36:19 -07:00
Gian Merlino 285516bede Workaround non-thread-safe use of HLL aggregators. (#3578)
Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used
by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and
"get" methods can be called simultaneously by OnheapIncrementalIndex, since its
"doAggregate" and "getMetricObjectValue" methods are not synchronized.

This means that the optimization of HyperLogLogCollector.fold in #3314 (saving and
restoring position rather than duplicating the storage buffer of the right-hand side)
could cause corruption in the face of concurrent writes.

This patch works around the issue by duplicating the storage buffer in "get" before
returning a collector. The returned collector still shares data with the original one,
but the situation is no worse than before #3314. In the future we may want to consider
making a thread safe version of HLLC that avoids these kinds of problems in realtime
indexing. But for now I thought it was best to do a small change that restored the old
behavior.
2016-10-17 09:39:12 -07:00
Roman Leventov 5dc95389f7 Add Checkstyle framework (#3551)
* Add Checkstyle framework

* Avoid star import

* Need braces for control flow statements

* Redundant imports

* Add NewLineAtEndOfFile check
2016-10-13 13:37:47 -07:00
Roman Leventov 85ac8eff90 Improve performance of IndexMergerV9 (#3440)
* Improve performance of StringDimensionMergerV9 and StringDimensionMergerLegacy by avoiding primitive int boxing by using IntIterator in IndexedInts instead of Iterator<Integer>; Extract some common logic for V9 and Legacy mergers; Minor improvements to resource handling in StringDimensionMergerV9

* Don't mask index in MergeIntIterator.makeQueueElement()

* DRY conversion RoaringBitmap's IntIterator to fastutil's IntIterator

* Do implement skip(n) in IntIterators extending AbstractIntIterator because original implementation is not reliable

* Use Test(expected=Exception.class) instead of try { } catch (Exception e) { /* ignore */ }
2016-10-13 08:28:46 -07:00
Charles Allen 76e77cb610 Make segment creation gauva 14 friendly (#3520) 2016-10-05 15:25:03 -07:00
Gian Merlino 40f2fe7893 Bump versions to 0.9.3-SNAPSHOT (#3524) 2016-09-29 13:53:32 -07:00
Charles Allen 654e1db309 Add simple test to FunctionalExtractionTest (#3522) 2016-09-28 23:45:15 -07:00
Gian Merlino d5a8a35fec groupBy: GroupByRowProcessor fixes, invert subquery context overrides. (#3502)
- Fix GroupByRowProcessor config overrides
- Fix GroupByRowProcessor resource limit checking
- Invert subquery context overrides such that for the subquery, its own
  keys override keys from the outer query, not the other way around.

The last bit is necessary for the test to work, and seems like a better
way to do it anyway.
2016-09-23 14:41:09 -07:00
Gian Merlino 7195be32d8 groupBy v2: Fix dangling references. (#3500)
Acquiring references in the processing task prevents dangling references
caused by canceled processing tasks.
2016-09-24 01:59:11 +05:30
Gian Merlino f8d71fc602 groupBy: Fix maxMergingDictionarySize config. (#3488) 2016-09-22 10:02:33 -07:00
Gian Merlino c87ecea975 Fix ListFilteredDimensionSpec blacklisting on non-present values. (#3487) 2016-09-22 09:12:02 -07:00
Navis Ryu 49c0fe0e8b Show candidate hosts for the given query (#2282)
* Show candidate hosts for the given query

* Added test cases & minor changes to address comments

* Changed path-param to query-pram for intervals/numCandidates
2016-09-22 08:32:38 -07:00
Keuntae Park 54ec4dd584 Support renaming of outputName for cached select and search query results (#3395)
* support renaming of outputName for cached select and search queries

* rebase and resolve conflicts

* rollback CacheStrategy interface change

* updated based on review comments
2016-09-20 08:19:14 -07:00
Charles Allen 95e08b38ea [QTL] Reduced Locking Lookups (#3071)
* Lockless lookups

* Fix compile problem

* Make stack trace throw instead

* Remove non-germane change

* * Add better naming to cache keys. Makes logging nicer
* Fix #3459

* Move start/stop lock to non-interruptable for readability purposes
2016-09-16 11:54:23 -07:00
Jonathan Wei df766b2bbd Add dimension handling interface for ingestion and segment creation (#3217)
* Add dimension handling interface for ingestion and segment creation

* update javadocs for DimensionHandler/DimensionIndexer

* Move IndexIO row validation into DimensionHandler

* Fix null column skipping in mergerV9

* Add deprecation note for 'numeric_dims' filename pattern in IndexIO v8->v9 conversion

* Fix java7 test failure
2016-09-12 12:54:02 -07:00
Gian Merlino d108461838 groupBy v2: Parallel disk spilling. (#3433)
In ConcurrentGrouper, when it becomes clear that disk spilling is necessary, switch
from hash-based partitioning to thread-based partitioning. This stops processing
threads from blocking each other while spilling is occurring.
2016-09-09 16:49:58 -06:00
Gian Merlino 1e3f94237e groupBy v2: Configurable load factor. (#3437)
Also change defaults:

- bufferGrouperMaxLoadFactor from 0.75 to 0.7.
- maxMergingDictionarySize to 100MB from 25MB, should be more appropriate
  for most heaps.
2016-09-07 14:14:59 -05:00
Roman Leventov 4f0bcdce36 Eager file unmapping in IndexIO, IndexMerger and IndexMergerV9 (#3422)
* Eager file unmapping in IndexIO, IndexMerger and IndexMergerV9. The exact purpose for this change is to allow running IndexMergeBenchmark in Windows, however should also be universally 'better' than non-deterministic unmapping, done when MappedByteBuffers are garbage-collected (BACKEND-312)

* Use Closer with a proper pattern in IndexIO, IndexMerger and IndexMergerV9

* Unmap file in IndexMergerV9.makeInvertedIndexes() using try-with-resources

* Reformat IndexIO
2016-09-07 10:43:47 -07:00
Gian Merlino 8d2ae144a8 groupBy: Short-circuit identity preCompute manipulators. (#3434) 2016-09-06 22:28:44 -06:00
Gian Merlino 1d07964987 LimitedTemporaryStorage: Fix perf bug. (#3432)
FilterOutputStream has an inefficient implementation of write(byte[], int, int).
So let's extend OutputStream directly and use efficient implementations of all
methods.
2016-09-06 15:39:36 -07:00
Gian Merlino 8ed1894488 groupBy: Omit timestamp from merge key when granularity = all. (#3416)
Fixes #3412.
2016-09-01 09:02:54 -07:00
Gian Merlino 6d25c5e053 Avoid materializing all groupBy results with order + limit. (#3410)
The old TopNFunction code did Sequences.toList on the input sequence before
using a priority queue to find the top N items. Now, the priority queue
is used in an accumulator, so there is no need to fully materialize the results.

Also removed equals/hashCode from the limitFn and remove limitFn from the
GroupByQuery's hashCode, since that wasn't necessary and the implementation
of hashCode wasn't correct anyway.
2016-08-31 14:08:07 -07:00
Gian Merlino 1268e2902c Add groupBy test for multiple multi-value dimensions. (#3415) 2016-08-31 11:21:10 -07:00
Gian Merlino e9050c2b4c TimeFormatExtractionFn: Allow null formats (equivalent to ISO8601) and granular bucketing. (#3411) 2016-08-31 20:58:53 +05:30
Keuntae Park 0076b5fc1a Interval bug fix for search query (#2903)
* support query granularity and interval for search query

* skip unncessary bitmap calculation when query interval contains whole the data interval of the given segments.

* use binary search to find start and end index for the given interval

* fix based on comment

* bug fix based on the review comments and add unit tests
2016-08-31 20:52:44 +05:30
Dave Li c4e8440c22 Adds long compression methods (#3148)
* add read

* update deprecated guava calls

* add write and vsizeserde

* add benchmark

* separate encoding and compression

* add header and reformat

* update doc

* address PR comment

* fix buffer order

* generate benchmark files

* separate encoding strategy and format

* fix benchmark

* modify supplier write to channel

* add float NONE handling

* address PR comment

* address PR comment 2
2016-08-30 16:17:46 -07:00
Jonathan Wei 4e91330a17 Use DimensionSpec in CardinalityAggregatorFactory (#3406)
* Use DimensionSpec in CardinalityAggregatorFactory

* Address PR comments

* Fix requiredFields()
2016-08-30 15:54:02 -07:00
Gian Merlino b11e9544ea GroupBy v2: Improve hash code distribution. (#3407)
Without this transformation, distribution of hash % X is poor in general.
It is catastrophically poor when X is a multiple of 31 (many slots would
be empty).
2016-08-30 12:09:08 +05:30
kaijianding f037dfcaa4 fix missing segments duplicate retried (#3398) 2016-08-29 23:46:21 +05:30
jaehong choi 2e0f253c32 introducing lists of existing columns in the fields of select queries' output (#2491)
* introducing lists of existing columns in the fields of select queries' output

* rebase master

* address the comment. add test code for select query caching

* change the cache code in SelectQueryQueryToolChest to 0x16
2016-08-25 21:37:53 +05:30
rajk-tetration 362b9266f8 Adding filters for TimeBoundary on backend (#3168)
* Adding filters for TimeBoundary on backend

Signed-off-by: Balachandar Kesavan <raj.ksvn@gmail.com>

* updating TimeBoundaryQuery constructor in QueryHostFinderTest

* add filter helpers

* update filterSegments + test

* Conditional filterSegment depending on whether a filter exists

* Style changes

* Trigger rebuild

* Adding documentation for timeboundaryquery filtering

* added filter serialization to timeboundaryquery cache

* code style changes
2016-08-15 10:25:24 -07:00
Gian Merlino e1b0b7de3e IndexBuilder: Allow replacing rows, customizable maxRows. (#3359) 2016-08-12 15:22:45 -07:00
Jonathan Wei 454587857c Make StringComparator deserialization case-insensitive (#3356) 2016-08-11 18:00:11 -07:00
Himanshu 043562914d Update IncrementalIndex.getMetricType() to return type name stored by ComplexMetricsSerde instead of AggregatorFactory.getTypeName() (#3341) 2016-08-10 11:03:44 -07:00
Gian Merlino 1eb7a7e882 Restore optimizations in BoundFilter. (#3343) 2016-08-10 08:53:17 -07:00
Gian Merlino a2bcd97512 IncrementalIndex: Fix multi-value dimensions returned from iterators. (#3344)
They had arrays as values, which MapBasedRow doesn't understand and
toStrings rather than converting to lists.
2016-08-10 08:47:29 -07:00
Jonathan Wei 890e3bdd3f More informative query unit test names (#3342) 2016-08-09 22:24:48 -07:00
Gian Merlino 8899affe48 Introduce standardized "Resource limit exceeded" error. (#3338)
Fixes #3336.
2016-08-09 10:50:56 -07:00
Gian Merlino 21bce96c4c More useful query errors. (#3335)
Follow-up to #1773, which meant to add more useful query errors but
did not actually do so. Since that patch, any error other than
interrupt/cancel/timeout was reported as `{"error":"Unknown exception"}`.

With this patch, the error fields are:

- error, one of the specific strings "Query interrupted", "Query timeout",
  "Query cancelled", or "Unknown exception" (same behavior as before).
- errorMessage, the message of the topmost non-QueryInterruptedException
  in the causality chain.
- errorClass, the class of the topmost non-QueryInterruptedException
  in the causality chain.
- host, the host that failed the query.
2016-08-09 16:14:52 +08:00
Gian Merlino 1aae5bd67d Nicer handling for cancelled groupBy v2 queries. (#3330)
1. Wrap temporaryStorage in a resource holder, to avoid spurious "Closed"
   errors from already-running processing tasks.
2. Exit early from the merging accumulator if the query is cancelled.
2016-08-05 14:48:06 -07:00
Jonathan Wei decefb7477 Add time interval dim filter and retention analysis example (#3315)
* Add time interval dim filter and retention analysis example

* Use closed-open matching for intervals, update cache key generation

* Fix time filtering tests for interval boundary change
2016-08-05 07:25:04 -07:00
Navis Ryu 5b3f0ccb1f Support variance and standard deviation (#2525)
* Support variance and standard deviation

* addressed comments
2016-08-04 17:32:58 -07:00
Gian Merlino 9437a7a313 HLL: Avoid some allocations when possible. (#3314)
- HLLC.fold avoids duplicating the other buffer by saving and restoring its position.
- HLLC.makeCollector(buffer) no longer duplicates incoming BBs.
- Updated call sites where appropriate to duplicate BBs passed to HLLC.
2016-08-03 18:08:52 -07:00
Gian Merlino a4b95af839 Fix grouper closing in GroupByMergingQueryRunnerV2. (#3316)
The grouperHolder should be closed on failure, not the grouper.
2016-08-02 21:02:30 -07:00
Gian Merlino 0299ac73b8 Fix FilteredAggregators at ingestion time and in groupBy v2 nested queries. (#3312)
The common theme between the two is they both create "fake" DimensionSelectors
that work on top of Rows. They both do it because there isn't really any
dictionary for the underlying Rows, they're just a stream of data. The fix for
both is to allow a DimensionSelector to tell callers that it has no dictionary
by returning CARDINALITY_UNKNOWN from getValueCardinality. The callers, in
turn, can avoid using it in ways that assume it has a dictionary.

Fixes #3311.
2016-08-02 17:39:40 -07:00
Gian Merlino ae3e0015b6 Fix ClassCastException in nested v2 groupBys with timeouts. (#3310)
Add tests for the CCE and for a bunch of other groupBy stuff.

Also avoids setting the interrupted flag when InterruptedExceptions
happen, since this might interfere with resource closing, no other
query does it, and is probably pointless anyway since the thread
is likely to be a jetty thread that we don't actually want to set
an interrupt flag on.

Also fixes toString on OrderByColumnSpec.
2016-08-02 16:02:44 -06:00
kaijianding 50d52a24fc ability to not rollup at index time, make pre aggregation an option (#3020)
* ability to not rollup at index time, make pre aggregation an option

* rename getRowIndexForRollup to getPriorIndex

* fix doc misspelling

* test query using no-rollup indexes

* fix benchmark fail due to jmh bug
2016-08-02 11:13:05 -07:00
Jonathan Wei 0bdaaa224b Use Long.compare for NumericComparator when possible (#3309) 2016-08-01 20:36:56 -07:00
Dave Li bc20658239 groupBy nested query using v2 strategy (#3269)
* changed v2 nested query strategy

* add test for #3239

* update for new ValueMatcher interface and add benchmarks

* enable time filtering

* address PR comments

* add failing test for outer filter aggregator

* add helper class for sharing code

* update nested groupby doc

* move temporary storage instantiation

* address PR comment

* address PR comment 2
2016-08-01 18:30:39 -07:00
Jonathan Wei a6105cbb86 Add numeric StringComparator (#3270)
* Add numeric StringComparator

* Only use direct long comparison for numeric ordering in BoundFilter, add time filtering benchmark query

* Address PR comments, add multithreaded BoundDimFilter test

* Add comment on strlen tie handling

* Add timeseries interval filter benchmark

* Adjust docs

* Use jackson for StringComparator, address PR comments

* Add new TopNMetricSpec and SearchSortSpec with tests (WIP)

* More TopNMetricSpec and SearchSortSpec tests

* Fix NewSearchSortSpec serde

* Update docs for new DimensionTopNMetricSpec

* Delete NumericDimensionTopNMetricSpec

* Delete old SearchSortSpec

* Rename NewSearchSortSpec to SearchSortSpec

* Add TopN numeric comparator benchmark, address PR comments

* Refactor OrderByColumnSpec

* Add null checks to NumericComparator and String->BigDecimal conversion function

* Add more OrderByColumnSpec serde tests
2016-07-29 15:44:16 -07:00
Navis Ryu 884017d981 "all" type search query spec (#3300)
* "all" type search query spec

* addressed comments

* added unit test
2016-07-28 18:16:15 -07:00
Gian Merlino 2553997200 Associate groupBy v2 resources with the Sequence lifecycle. (#3296)
This fixes a potential issue where groupBy resources could be allocated to
create a Sequence, but then the Sequence is never used, and thus the resources
are never freed.

Also simplifies how groupBy handles config overrides (this made the new
unit test easier to write).
2016-07-27 18:44:19 -07:00
Gian Merlino 9b5523add3 Reference counting, better error handling for resources in groupBy v2. (#3268)
Refcounting prevents releasing the merge buffer, or closing the concurrent
grouper, before the processing threads have all finished. The better
error handling prevents an avalanche of per-runner exceptions when grouping
resources are exhausted, by grouping those all up into a single merged
exception.
2016-07-27 01:59:02 +05:30
Erik Dubbelboer 76fabcfdb2 Fix #2782, Unit test failed for DruidProcessingConfigTest.testDeserialization (#3231)
On systems with only once processor this test fails.
2016-07-25 15:51:09 -07:00
kaijianding 3dc2974894 Add timestampSpec to metadata.drd and SegmentMetadataQuery (#3227)
* save TimestampSpec in metadata.drd

* add timestampSpec info in SegmentMetadataQuery
2016-07-25 15:45:30 -07:00
Jonathan Wei a42ccb6d19 Support filtering on long columns (including __time) (#3180)
* Support filtering on __time column

* Rename DruidPredicate

* Add docs for ValueMatcherFactory, add comment on getColumnCapabilities

* Combine ValueMatcherFactory predicate methods to accept DruidCompositePredicate

* Address PR comments (support filter on all long columns)

* Use predicate factory instead of composite predicate

* Address PR comments

* Lazily initialize long handling in selector/in filter

* Move long value parsing from InFilter to InDimFilter, make long value parsing thread-safe

* Add multithreaded selector/in filter test

* Fix non-final lock object in SelectorDimFilter
2016-07-20 17:08:49 -07:00
Gian Merlino 06624c40c0 Share query handling between Appenderator and RealtimePlumber. (#3248)
Fixes inconsistent metric handling between the two implementations. Formerly,
RealtimePlumber only emitted query/segmentAndCache/time and query/wait and
Appenderator only emitted query/partial/time and query/wait (all per sink).

Now they both do the same thing:
- query/segmentAndCache/time, query/segment/time are the time spent per sink.
- query/cpu/time is the CPU time spent per query.
- query/wait/time is the executor waiting time per sink.

These generally match historical metrics, except segmentAndCache & segment
mean the same thing here, because one Sink may be partially cached and
partially uncached and we aren't splitting that out.
2016-07-19 22:15:13 -05:00
Nishant 7995818220 Increase test timeout to prevent failing on slow machines (#3224)
constantly timing out on one of slow build machines, increasing the
timeout fixed it.

Running io.druid.granularity.QueryGranularityTest
Tests run: 33, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.776
sec - in io.druid.granularity.QueryGranularityTest
2016-07-17 18:44:48 -07:00
Gian Merlino 6cd1f5375b Better harmonized dimensions for query metrics. (#3245)
All query metrics now start with toolChest.makeMetricBuilder, and all of
*those* now start with DruidMetrics.makePartialQueryTimeMetric. Also, "id"
moved to common code, since all query metrics added it anyway.

In particular this will add query-type specific dimensions like "threshold"
and "numDimensions" to servlet-originated metrics like query/time.
2016-07-14 11:55:51 -07:00
Gian Merlino ea03906fcf Configurable compressRunOnSerialization for Roaring bitmaps. (#3228)
Defaults to true, which is a change in behavior (this used to be false and unconfigurable).
2016-07-08 10:24:19 +05:30
Gian Merlino fdc7e88a7d Allow queries with no aggregators. (#3216)
This is actually reasonable for a groupBy or lexicographic topNs that is
being used to do a "COUNT DISTINCT" kind of query. No aggregators are
needed for that query, and including a dummy aggregator wastes 8 bytes
per row.

It's kind of silly for timeseries, but why not.
2016-07-06 20:38:54 +05:30
Jonathan Wei f3a3662133 Fix compile error in SearchBinaryFnTest (#3201) 2016-06-29 09:44:45 -05:00
jaehong choi efbcbf5315 Support alphanumeric sort in search query (#2593)
* support alphanumeric sort in search query

* address a comment about handling equals() and hashCode()

* address comments

* add Ut for string comparators

* address a comment about space indentations.
2016-06-28 15:06:18 -07:00
Hyukjin Kwon 45f553fc28 Replace the deprecated usage of NoneShardSpec (#3166) 2016-06-25 10:27:25 -07:00
Gian Merlino 4cc39b2ee7 Alternative groupBy strategy. (#2998)
This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.

Both of these are described in more detail in #2987.

There are two goals of this patch:

1. Make it possible for historical/realtime nodes to return larger groupBy
   result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
   columns, avoiding materialization.

This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
2016-06-24 18:06:09 -07:00
Dave Li 8a08398977 Add segment pruning based on secondary partition dimension (#2982)
* add get dimension rangeset to filters

* add get domain to ShardSpec and added chunk filter in caching clustered client

* add null check and modified not filter, started with unit test

* add filter test with caching

* refactor and some comments

* extract filtershard to helper function

* fixup

* minor changes

* update javadoc
2016-06-24 14:52:19 -07:00
michaelschiff 66d8ad36d7 adds new coordinator metrics 'segment/unavailable/count' and (#3176)
'segment/underReplicated/count' (#3173)
2016-06-23 14:53:15 -07:00
Gian Merlino da660bb592 DumpSegment tool. (#3182)
Fixes #2723.
2016-06-23 14:37:50 -07:00
Gian Merlino a437fb150b Fix SegmentMetadataQuery when queryGranularity is requested but not present. (#3181) 2016-06-23 14:30:50 -07:00
Jonathan Wei 24860a1391 Two-stage filtering (#3018)
* Two-stage filtering

* PR comment
2016-06-22 16:08:21 -07:00
Nishant f46ad9a4cb support Union Segment metadata queries (#3132)
* support Union Segment metadata queries

fix 3128

* remove extraneous sys out
2016-06-21 10:30:50 -07:00
Dave Li 12be1c0a4b Add bucket extraction function (#3033)
* add bucket extraction function

* add doc and header

* updated doc and test
2016-06-17 09:24:27 -07:00
Gian Merlino ebf890fe79 Update master version to 0.9.2-SNAPSHOT. (#3133) 2016-06-13 13:10:38 -07:00
Nishant 0d427923c0 fix caching for search results (#3119)
* fix caching for search results

properly read count when reading from cache.

* fix NPE during merging search count and add test

* Update cache key to invalidate prev results
2016-06-09 17:49:47 -07:00
Gian Merlino 5998de7d5b Fix lenient merging of conflicting aggregators. (#3113)
This should have marked the conflicting aggregator as null, but instead it
threw an NPE for the entire query.
2016-06-08 15:56:48 -07:00
Jonathan Wei 37c8a8f186 Speed up filter tests with adapter cache (#3103) 2016-06-08 07:41:10 -07:00
Gian Merlino 54139c6815 Fix NPE in registeredLookup extractionFn when "optimize" is not provided. (#3064) 2016-06-03 12:58:17 -05:00
Gian Merlino 6171e078c8 Improve NPE message in LookupDimensionSpec when lookup does not exist. (#3065)
The message used to be empty, which made things hard to debug.
2016-06-02 19:59:12 -07:00
John Wang e662efa79f segment interface refactor for proposal 2965 (#2990) 2016-05-26 20:36:41 -07:00
Kurt Young b5bd406597 fix #2991: race condition in OnheapIncrementalIndex#addToFacts (#3002)
* fix #2991: race condition in OnheapIncrementalIndex#addToFacts

* add missing header

* handle parseExceptions when first doing first agg
2016-05-25 19:05:46 -07:00
Jonathan Wei b72c54c4f8 Add benchmark data generator, basic ingestion/persist/merge/query benchmarks (#2875) 2016-05-25 16:39:37 -07:00
Dave Li dcabd4b1ee Add lookup optimization for InDimFilter (#2938)
* Add lookup optimization for InDimFilter

* tests for in filter with lookup extraction fn

* refactor

* refactor2 and modified filter test

* make optimizeLookup private
2016-05-19 16:29:16 -07:00
Charles Allen 15ccf451f9 Move QueryGranularity static fields to QueryGranularities (#2980)
* Move QueryGranularity static fields to QueryGranularityUtil
* Fixes #2979

* Add test showing #2979

* change name to QueryGranularities
2016-05-17 16:23:48 -07:00
Charles Allen fb01db4db7 [QTL] Allows RegisteredLookupExtractionFn to find its lookups lazily (#2971)
* Allows RegisteredLookupExtractionFn to find its lookups lazily

* Use raw variables instead of AtomicReference

* Make sure to use volatile

* Remove extra local variable.

* Move from BAOS to ByteBuffer
2016-05-17 11:29:39 -07:00
Himanshu d3e9c47a5f use correct ObjectMapper in Index[IO/Merger] in AggregationTestHelper and minor fix in theta sketch SketchMergeAggregatorFactory.getMergingFactory(..) (#2943) 2016-05-13 10:06:31 +05:30
Himanshu d821144738 at historicals GpBy query mergeResults does not need merging as results are already merged by GroupByQueryRunnerFactory.mergeRunners(..) (#2962) 2016-05-12 17:41:24 -07:00
Gian Merlino 01bebf432a GroupByQuery: Multi-value dimension tests. (#2959) 2016-05-12 11:31:50 -07:00
Charles Allen a31348450f Add toString for LookupConfig (#2935)
* Helps with operations and getting where the snapshot dir is
2016-05-09 18:20:00 -07:00
Dave Li 79a54283d4 Optimize filter for timeseries, search, and select queries (#2931)
* Optimize filter for timeseries, search, and select queries

* exception at failed toolchest type check

* took out query type check

* java7 error fix and test improvement
2016-05-09 11:04:06 -07:00
Slim 8b570ab130 make it clear what LookupExtractorFactory start/stop methods return (#2925) 2016-05-05 10:38:40 -07:00
David Lim b489f63698 Supervisor for KafkaIndexTask (#2656)
* supervisor for kafka indexing tasks

* cr changes
2016-05-04 23:13:13 -07:00
Himanshu 8e2742b7e8 adding QueryGranularity to segment metadata and optionally expose same from segmentMetadata query (#2873) 2016-05-03 11:31:10 -07:00
Gian Merlino 40e595c7a0 Remove types from TimeAndDims, they aren't needed. (#2865) 2016-05-03 13:10:25 -05:00
binlijin 841be5c61f periodically emit metric segment/scan/pending (#2854) 2016-05-02 22:38:13 -07:00
Navis Ryu 2729fea84d Fix parsing fail of segment id with datasource containing underscore (#2797)
* Fix parsing fail of segment id with underscored datasource (Fix for #2786)

* addressed comment

* renamed and moved code into api. added log4 dependency for tests

* addressed comments

* fixed test fails
2016-05-02 22:37:28 -07:00
Gian Merlino 90ce03c66f Fix integer overflow in SegmentMetadataQuery numRows. (#2890) 2016-04-27 14:37:04 -07:00
Gian Merlino 6dc7688a29 TimeAndDims equals/hashCode implementation. (#2870)
Adapted from #2692, thanks @navis for original implementation.
2016-04-22 08:45:20 +08:00
Himanshu 3cfd9c64c9 make singleThreaded groupBy query config overridable at query time (#2828)
* make isSingleThreaded groupBy query processing overridable at query time

* refactor code in GroupByMergedQueryRunner to make processing of single threaded and parallel merging of runners consistent
2016-04-21 17:12:58 -07:00
Slim 984a518c9f Merge pull request #2734 from b-slim/LookupIntrospection2
[QTL][Lookup] adding introspection endpoint
2016-04-21 12:15:57 -05:00
Gian Merlino c74391e54c JavaScript: Ability to disable. (#2853)
Fixes #2852.
2016-04-21 09:43:15 -05:00
Gian Merlino 7d3e55717d Reduce cost of various toFilter calls. (#2860)
These happen once per segment and so it's better if they don't do
as much work.
2016-04-21 04:28:46 +08:00
Gian Merlino 59460b17cc Add Filters.matchPredicate helper, use it where appropriate. (#2851)
This approach simplifies code and is generally faster, due to skipping
unnecessary dictionary lookups (see #2850).
2016-04-19 15:54:32 -07:00
Xavier Léauté b2745befb7 remove obsolete comment (#2858) 2016-04-19 13:06:58 -07:00
Jisoo Kim 7b65ca7889 refactor ClientQuerySegmentWalker (#2837)
* refactor ClientQuerySegmentWalker

* add header to FluentQueryRunnerBuilder

* refactor QueryRunnerTestHelper
2016-04-18 14:00:47 -07:00
Gian Merlino 7c0b1dde3a DimensionPredicateFilter: Skip unnecessary dictionary lookup. (#2850) 2016-04-18 12:38:25 -07:00
Jonathan Wei b534f7203c Fix performance regression from #2753 in IndexMerger (#2841) 2016-04-14 21:39:41 -07:00
Jonathan Wei a26134575b Fix NPE in TopNLexicographicResultBuilder.addEntry() (#2835) 2016-04-13 17:27:16 -07:00
Fangjin Yang abd951df1a Document how to use roaring bitmaps (#2824)
* Document how to use roaring bitmaps

This fixes #2408.
While not all indexSpec properties are explained, it does explain how roaring bitmaps can be turned on.

* fix

* fix

* fix

* fix
2016-04-12 19:28:02 -07:00
michaelschiff db35dd7508 fix issue #2744. Check for null before combining metrics (#2774) 2016-04-12 14:46:31 -07:00
Nishant 1bf1dd03a0 Merge pull request #2812 from mrijke/fix-missing-equals-hashcode-filters
Add missing equals/hashcode to JS, Regex and SearchQuery DimFilters
2016-04-12 12:00:23 +05:30
Charles Allen 21e406613c Merge pull request #2809 from metamx/fix2694
Fix test for snapshot taker to better check for lookup perist failure
2016-04-11 14:52:47 -07:00
Maarten Rijke de68d6b7c4 Add missing equals/hashcode to JS, Regex and SearchQuery DimFilters
This commits adds missing equals() and hashcode() methods to
 the JavascriptDimFilter, RegexDimFilter and the SearchQueryDimFilter.
2016-04-11 12:16:24 +02:00
Nishant bbb326decf Merge pull request #2799 from b-slim/fix_snapshot
MapLookupFactory need to be Ser/Desr ready.
2016-04-07 13:22:34 +05:30
Slim Bouguerra bf1eafc4e1 remove all the mock lookupFactory 2016-04-06 15:37:52 -05:00
Slim Bouguerra 59eb2490a0 MapLookupFactory need to be Ser/Desr. 2016-04-06 15:02:18 -05:00
Charles Allen f915a59138 Merge pull request #2691 from metamx/lookupExtrFn
Add ExtractionFn to LookupExtractor bridge
2016-04-06 09:13:08 -07:00
jon-wei 051fd6c0eb Remove extra println from InFilter 2016-04-05 14:55:49 -07:00
Fangjin Yang 289bb6f885 Merge pull request #2690 from jon-wei/filter_support
Allow filters to use extraction functions
2016-04-05 15:40:15 -06:00
jon-wei 0e481d6f93 Allow filters to use extraction functions 2016-04-05 13:24:56 -07:00
Gian Merlino e060a9f283 Additional ExtractionFn null-handling adjustments.
Followup to comments on #2771.
2016-04-01 18:35:26 -07:00
Fangjin Yang 18b9ea62cf Merge pull request #2771 from gianm/extractionfn-stuff
Various ExtractionFn null handling fixes.
2016-04-01 16:35:46 -07:00
Gian Merlino 23d66e5ff9 Merge pull request #2765 from navis/invalid-encode-nullstring
Null string is encoded as "null" in incremental index
2016-04-01 14:43:40 -07:00
Gian Merlino b6e4d8b2c1 Various ExtractionFn null handling fixes.
- JavaScriptExtractionFn shouldn't pass empty strings to its JS functions
- Upper/LowerExtractionFn properly handles null Objects (DimExtractionFn's implementation works here)
- MatchingDimExtractionFn properly returns nulls rather than empties
- RegexDimExtractionFn properly attempts matching on nulls and empties
- SearchQuerySpecDimExtractionFn properly returns nulls when passed empties
2016-04-01 14:34:47 -07:00
Fangjin Yang eea7a47870 Merge pull request #2576 from navis/paging-from-next
Add option for select query to get next page without modifying returned paging identifiers
2016-04-01 13:50:36 -07:00
Fangjin Yang 4eb5a2c4f1 Merge pull request #2715 from navis/stringformat-null-handling
stringFormat extractionFn should be able to return null on null values (Fix for #2706)
2016-04-01 13:45:28 -07:00
Gian Merlino 23364a47fd BaseFilterTest: Test optimized filters too. 2016-04-01 12:44:59 -07:00
navis.ryu 077522a46f stringFormat extractionFn should be able to return null on null values (Fix for #2706) 2016-04-01 13:40:56 +09:00
navis.ryu f0e55f5d31 Null string is encoded as "null" in incremental index 2016-04-01 09:47:15 +09:00
navis.ryu 29bb00535b Add option for select query to get next page without modifying returned paging identifiers 2016-04-01 09:03:03 +09:00
Gian Merlino 5f9240fcbc Merge pull request #2577 from navis/native-in-filter
Implement native in filter
2016-03-30 20:02:54 -07:00
Fangjin Yang 3d68da94fe Merge pull request #2661 from navis/utf8-estimated-length
Utility method for length estimation of utf8
2016-03-30 19:56:14 -07:00
navis.ryu 108535fd07 Implement native in filter (Fix for #2577) 2016-03-31 10:10:57 +09:00
navis.ryu e0cfd9ee19 Utility method for length estimation of utf8 2016-03-31 10:07:00 +09:00
jon-wei 5503bf1b38 Remove unnecessary type check in TimeAndDimsComp 2016-03-30 17:54:15 -07:00
Fangjin Yang 95733a362f Merge pull request #2753 from gianm/null-filtering-multi-value-columns
More consistent empty-set filtering behavior on multi-value columns.
2016-03-29 18:52:25 -07:00
Charles Allen 95d42cfd9e Merge pull request #2758 from pjain1/fix_npe_in_filter
handle null values in In Filter
2016-03-29 17:53:02 -07:00
Gian Merlino 1853f36e9f More consistent empty-set filtering behavior on multi-value columns.
The behavior is now that filters on "null" will match rows with no
values. The behavior in the past was inconsistent; sometimes these
filters would match and sometimes they wouldn't.

Adds tests for this behavior to SelectorFilterTest and
BoundFilterTest, for query-level filters and filtered aggregates.

Fixes #2750.
2016-03-29 15:32:13 -07:00
Parag Jain d892918a3d handle null values in In Filter 2016-03-29 17:03:26 -05:00
Fangjin Yang e023df2b92 Merge pull request #2754 from gianm/i-dont-get-it
Remove error suppression code from IncrementalIndexAdapter.
2016-03-28 19:29:53 -07:00
Gian Merlino c7ff0d698e Remove error suppression code from IncrementalIndexAdapter. 2016-03-28 18:40:27 -07:00
fjy c418a55638 cleanup distinct count agg 2016-03-28 17:29:41 -07:00
Fangjin Yang 9cb197adec Merge pull request #2722 from himanshug/fix_hadoop_jar_upload
config to explicitly specify classpath for hadoop container during hadoop ingestion
2016-03-28 14:49:03 -07:00
Charles Allen 4a98c4fbac Fix LookupExtractionFn equals and hashCode 2016-03-28 13:14:43 -07:00
Charles Allen 0ee861d0da Add ExtractionFn to LookupExtractor bridge 2016-03-28 13:14:43 -07:00
Fangjin Yang 7fe277e6da Merge pull request #2727 from gianm/optimize-bound-filter
BoundFilter optimizations, and related interface changes.
2016-03-26 18:59:05 -07:00
Fangjin Yang 0dae28b6af Merge pull request #2729 from jon-wei/fix_hyperunique_comparator
Fix HyperUniquesAggregatorFactory comparator
2016-03-26 15:39:35 -07:00
Gian Merlino 2970b49adc BoundFilter optimizations, and related interface changes.
BoundFilter:

- For lexicographic bounds, use bitmapIndex.getIndex to find the start and end points,
  then union all bitmaps between those points.
- For alphanumeric bounds, iterate through dimValues, and union all bitmaps for values
  matching the predicate.
- Change behavior for nulls: it used to be that the BoundFilter would never match nulls,
  now it matches nulls if "" is allowed by the lower limit and not excluded by the
  upper limit.

Interface changes:

- BitmapIndex: add `int getIndex(value)` to make it possible to get the index for a
  value without retrieving the bitmap.
- BitmapIndex: remove `ImmutableBitmap getBitmap(value)`, change callers to `getBitmap(getIndex(value))`.
- BitmapIndexSelector: allow retrieving the underlying BitmapIndex through getBitmapIndex.
- Clarified contract of indexOf in Indexed, GenericIndexed.

Also added tests for SelectorFilter, NotFilter, and BoundFilter.
2016-03-25 14:11:48 -07:00
jon-wei 9afaa2b94a Fix HyperUniquesAggregatorFactory comparator 2016-03-25 12:36:42 -07:00
Gian Merlino 4ac9e03161 Fix predicate-based ValueMatcher behavior for IncrementalIndex on missing columns.
Missing columns should be treated the same as columns containing 100% nulls.
2016-03-25 10:23:59 -07:00
Himanshu Gupta e78a469fb7 UTs for ExtensionsConfig 2016-03-25 10:51:28 -05:00
Himanshu Gupta 004b00bb96 config to explicitly specify classpath for hadoop container during hadoop ingestion 2016-03-25 10:51:28 -05:00
Nishant 0b03c9405f Merge pull request #2614 from sirpkt/calendric_gran
Support week, month, quarter, and year in query granularity
2016-03-24 16:21:01 -07:00
Himanshu 56343c6cdc Merge pull request #2704 from navis/simple-optimize
optimize single elemented and/or filter
2016-03-24 16:13:48 -05:00
Gian Merlino 713062053c Filters: Add filter.toFilter method, use that instead of the instanceof chain in Filters.
I believe that the instanceof chain in Filters exists because in the past, Filter
and DimFilter were in different packages (DimFilter was in druid-client and Filter
was in druid-processing). And since druid-client didn't depend on druid-processing,
DimFilter couldn't have a toFilter method. But now it can.
2016-03-23 17:03:49 -07:00
Gian Merlino dd86198902 All Filters should work with FilteredAggregators.
This removes Filter.makeMatcher(ColumnSelectorFactory) and adds a
ValueMatcherFactory implementation to FilteredAggregatorFactory so it can
take advantage of existing makeMatcher(ValueMatcherFactory) implementations.

This patch also removes the Bound-based method from ValueMatcherFactory. Its
only user was the SpatialFilter, which could use the Predicate-based method.

Fixes #2604.
2016-03-23 12:24:01 -07:00
binlijin 57d78d3293 clean tmp file when index merge fail 2016-03-23 10:55:12 +08:00
navis.ryu 91f6be4884 optimize single elemented and/or filter 2016-03-23 09:29:15 +09:00
Gian Merlino ff25325f3b Improved docs for multi-value dimensions.
- Add central doc for multi-value dimensions, with some content from other docs.
- Link to multi-value dimension doc from topN and groupBy docs.
- Fixes a broken link from dimensionspecs.md, which was presciently already
  linking to this nonexistent doc.
- Resolve inconsistent naming in docs & code (sometimes "multi-valued", sometimes
  "multi-value") in favor of "multi-value".
2016-03-22 14:40:55 -07:00
jon-wei a59c9ee1b1 Support use of DimensionSchema class in DimensionsSpec 2016-03-21 13:12:04 -07:00
Keuntae Park 7f29f2ac3b support week, month, quarter, year in query granularity 2016-03-21 17:41:53 +09:00
Charles Allen 5da9a280b6 Query Time Lookup - Dynamic Configuration 2016-03-18 09:45:05 -07:00
Gian Merlino 738dcd8cd9 Update version to 0.9.1-SNAPSHOT.
Fixes #2462
2016-03-17 10:34:20 -07:00
Slim cf342d8d3c Merge pull request #2517 from b-slim/adding_lookup_snapshot_utility
[QTL][Lookup] lookup module with the snapshot utility
2016-03-17 11:39:47 -05:00
Slim Bouguerra 0c86b29ef0 lookup module with the snapshot utility 2016-03-17 09:20:41 -05:00
Charles Allen 2ac8a22173 Merge pull request #2579 from metamx/closerIsCloser
Make CloserRule use guava's Closer
2016-03-14 17:18:19 -07:00
Charles Allen a64979463f Make CloserRule use guava's Closer 2016-03-14 15:01:24 -07:00
Fangjin Yang 06813b510a Merge pull request #2571 from himanshug/gp_by_avoid_sort
avoid sort while doing groupBy merging when possible
2016-03-14 14:46:51 -07:00
Fangjin Yang dbdbacaa18 Merge pull request #2260 from navis/cardinality-for-searchquery
Support cardinality for search query
2016-03-14 13:24:40 -07:00
Slim 8cc3582e70 Merge pull request #2644 from metamx/optimize-timeboundary
optimize timeboundary for min or max bound
2016-03-13 13:16:24 -05:00
navis.ryu be341bf4e3 Support cardinality for search query (Fix for #2260) 2016-03-12 09:51:01 +09:00
Xavier Léauté 6f0d6ef0e9 optimize timeboundary for min or max bound 2016-03-11 14:11:47 -08:00
Gian Merlino 8a11161b20 Plumbers: Move plumber.add out of try/catch for ParseException.
The incremental indexes handle that now so it's not necessary.

Also, add debug logging and more detailed exceptions to the incremental
indexes for the case where there are parse exceptions during aggregation.
2016-03-10 16:39:26 -08:00
Himanshu Gupta dc0214bddb while GroupBy merging use unsorted facts in IncrementalIndex wherever possible 2016-03-10 16:11:48 -06:00
Himanshu Gupta 02dfd5cd80 update IncrementalIndex to support unsorted facts map that can be used in groupBy merging to improve performance 2016-03-10 16:11:48 -06:00
Xavier Léauté 90d7409e1a Merge pull request #2611 from himanshug/gp_by_max_limit
only allow lowering maxResults and maxIntermediateRows from groupBy query context
2016-03-10 13:44:13 -08:00
Gian Merlino a2b1652787 Clarify parser docs.
- Clarify what parseSpecs are used for.
- Avro, Protobuf should use timeAndDims parseSpecs.
- Hadoop jobs should use hadoopyString string parsers.
2016-03-10 08:45:04 -08:00
Fangjin Yang 68cffe1d91 Merge pull request #2615 from gianm/timeseries-skipEmptyBuckets-cache
Fix caching of skipEmptyBuckets for TimeseriesQuery.
2016-03-09 18:45:59 -08:00
Gian Merlino 708bc674fa Make specifying query context booleans more consistent.
Before, some needed to be strings and some needed to be real booleans. Now
they can all be either one.
2016-03-08 19:38:26 -08:00
Gian Merlino 40dad6dff4 Fix caching of skipEmptyBuckets for TimeseriesQuery. 2016-03-08 19:22:12 -08:00
Himanshu Gupta ca5de3f583 only allow lowering maxResults and maxIntermediateRows from groupBy query context 2016-03-08 15:03:59 -06:00
Himanshu Gupta 099acb4966 allow groupBy max[Intermediate]Rows limit be overridable by context 2016-03-07 15:22:41 -06:00
Himanshu Gupta c544ebf25e reintroducing the safety check removed in commit-1d602be so that dim value ids are less than cardinality 2016-03-03 23:34:23 -06:00
Bingkun Guo 4a58462fc7 update querySegmentSpec when passing query to getQueryRunner
After finding the FireChief for a specific partition, Druid will need to find the specific queryRunner for each segment being queried by passing the query to FireChief. Currently Druid is passing the original query that contains all the segments need to be queried, it's possible that fireChief.getQueryRunner(query) returns more than 1 queryRunner because query.getIntervals() is not specific to a single segment.

In this patch, for each segment being queried, Druid will update the query with its corresponding SpecificSegmentSpec.
2016-03-02 16:44:56 -06:00
Nishant 31b502773a Merge pull request #2480 from navis/pagingfail-over-segments
Select query cannot span to next segment with paging
2016-03-01 11:42:41 +05:30
Fangjin Yang e5c25725c0 Merge pull request #2562 from himanshug/fix_2556
with nested GpBy query outer query results need to be further merged
2016-02-29 12:17:33 -08:00
Himanshu Gupta 0722ced413 with GpBy query outer query results need to be further merged 2016-02-29 10:16:25 -06:00
navis.ryu b1ff920831 Lazily initialize predicate for bound filter 2016-02-29 15:35:52 +09:00
navis.ryu 5f1e60324a Added more complex test case with versioned segments 2016-02-29 14:48:24 +09:00
navis.ryu 2686bfa394 Select query cannot span to next segment with paging 2016-02-29 00:01:46 +09:00
Fangjin Yang 29d29ba98d Merge pull request #2263 from jon-wei/flex_dims3
Allow IncrementalIndex to store Long/Float dimensions
2016-02-25 17:23:02 -08:00
jon-wei c17ce02467 Allow IncrementalIndex to store Long/Float dimensions 2016-02-24 13:51:57 -08:00
jon-wei fd3782522c Rename 'replaceMissingValues...' parameters in RegexExtractionFn 2016-02-24 13:12:56 -08:00
Nishant fb7eae34ed Merge pull request #2249 from metamx/workerExpanded
Use Worker instead of ZkWorker whenever possible
2016-02-24 13:23:22 +05:30
Charles Allen ac13a5942a Use Worker instead of ZkWorker whenver possible
* Moves last run task state information to Worker
* Makes WorkerTaskRunner a TaskRunner which has interfaces to help with getting information about a Worker
2016-02-23 15:02:03 -08:00
Gian Merlino 3534483433 Better handling of ParseExceptions.
Two changes:
- Allow IncrementalIndex to suppress ParseExceptions on "aggregate".
- Add "reportParseExceptions" option to realtime tuning configs. By default this is "false".

Behavior of the counters should now be:

- processed: Number of rows indexed, including rows where some fields could be parsed and some could not.
- thrownAway: Number of rows thrown away due to rejection policy.
- unparseable: Number of rows thrown away due to being completely unparseable (no fields salvageable at all).

If "reportParseExceptions" is true then "unparseable" will always be zero (because a parse error would
cause an exception to be thrown). In addition, "processed" will only include fully parseable rows
(because even partial parse failures will cause exceptions to be thrown).

Fixes #2510.
2016-02-23 10:11:43 -08:00
Fangjin Yang 3bdd757024 Merge pull request #1773 from b-slim/log_details
Adding downstream source when throwing QueryInterruptedException
2016-02-22 10:16:07 -08:00
Slim Bouguerra 77925cc061 adding downstream source of QueryInterruptedException 2016-02-20 13:05:14 -06:00
Fangjin Yang 8ee81947cd Merge pull request #2494 from himanshug/fix_timeseries
do not drop post-aggs in TimeseriesQueryToolChest.makePreComputeManipulatorFn
2016-02-20 10:37:32 -08:00
Gian Merlino d25c46cb9f Add comparator to HyperUniquesFinalizingPostAggregator.
This makes it possible to do groupBys with clauses like "HAVING uniques > 10".
Beforehand you couldn't do it with either an aggregator (because it returns
an HLLV1 which the havingSpec can't understand) or a finalized postaggregator
(because it didn't have a comparator).

Now you can at least do it with a finalizing postaggregator. Trying it with
the aggregator alone still doesn't work.

Added some topN and groupBy tests verifying the comparator, and added an
@Ignore test that should pass if havingSpecs are made work on the aggregator
directly.
2016-02-19 08:36:08 -08:00
Himanshu Gupta 11b0117422 do not drop post-aggs in timeseries query tool chest makePreComputeManipulatorFn like other query types 2016-02-17 20:51:35 -06:00
Jaehong Choi 32b9d57b23 handle a failing UT in GroupByQueryRunnerTest after merging into the master 2016-02-16 16:56:57 +09:00
Jaehong Choi b25bca85bc Merge branch 'master' of https://github.com/druid-io/druid into support-alphanumeric-dimensional-sort-in-gropu-by 2016-02-16 16:42:05 +09:00
Jaehong Choi e89afc901b delete System.out.println() in test code 2016-02-16 15:26:37 +09:00
Navis Ryu cd315627c9 Merge pull request #2393 from CHOIJAEHONG1/support-alphanumeric-dimensional-sort-in-gropu-by
support alphanumeric sorting for dimensional columns in groupby (#2393)
2016-02-16 14:11:30 +09:00
Slim 16092eb5e2 Merge pull request #2464 from gianm/print-properties
Make startup properties logging optional.
2016-02-14 15:11:35 -06:00
Gian Merlino e0c049c0b0 Make startup properties logging optional.
Off by default, but enabled in the example config files. See also #2452.
2016-02-12 14:12:16 -08:00
Himanshu Gupta da5fcd0124 before facts get it , indexAndOffsets should already know about it 2016-02-12 13:32:06 -06:00
Jonathan Wei d63eec65a1 Merge pull request #2208 from navis/metadataquery-minmax
Support min/max values for metadata query
2016-02-11 17:28:07 -08:00
Jonathan Wei e1b022eac9 Merge pull request #2349 from navis/dimensionspec-for-selectquery
Support dimension spec for select query
2016-02-11 16:38:16 -08:00
navis.ryu dd2375477a Support min/max values for metadata query (#2208) 2016-02-12 09:35:58 +09:00
Gian Merlino 2d037ef05e Merge pull request #2453 from DreamLab/fix/topn_sorting_anomaly
Fix for unstable behavior of HyperLogLog comparator
2016-02-11 16:05:34 -08:00
navis.ryu 4d63196535 Support dimension spec for select query 2016-02-12 08:54:28 +09:00
Himanshu 47d48e1e67 Merge pull request #2452 from gianm/print-properties
PropertiesModule: Print properties, processors, totalMemory on startup.
2016-02-11 16:49:34 -06:00
turu f277a54a5c removed unsafe heuristics from hll compareTo and provided unit test for regression 2016-02-11 23:46:24 +01:00
Slim 368988d187 Merge pull request #2291 from druid-io/lookupManager
Promoting LookupExtractor state and LookupExtractorFactory to be a first class druid state object.
2016-02-11 16:07:27 -06:00
Gian Merlino 29f7758e74 PropertiesModule: Print properties, processors, totalMemory on startup. 2016-02-11 13:51:08 -08:00
Slim Bouguerra 4e119b7a24 Adding lookup ref manager and lookup dimension spec impl 2016-02-11 12:11:51 -06:00
Jaehong Choi 2f2e2ff5b9 support alphanumeric sorting for dimensional columns in groupby 2016-02-11 17:31:28 +09:00
Keuntae Park 05a144e39a fix crash with filtered aggregator at ingestion time
- only for selector filter because extraction filter is not supported as
  cardinality is not fixed at ingestion time
2016-02-11 11:25:33 +09:00
Fangjin Yang b1673ee90e Merge pull request #2409 from gianm/smq-merged-thing
SegmentMetadataQuery: Retain segment id when merging, if possible.
2016-02-08 15:43:39 -08:00
Fangjin Yang c9c20bb7f3 Merge pull request #2395 from metamx/fixExtractionDimFilterNullTest
Actually check cache key null checking in ExtractionDimFilterTest
2016-02-08 14:10:52 -08:00
Gian Merlino bd9c04244f SegmentMetadataQuery: Retain segment id when merging, if possible.
This is helpful on realtime nodes, where two analyses from two different hydrants
are merged together but they are actually from the same segment.
2016-02-08 13:07:02 -08:00
Himanshu Gupta 9fe1b28ee5 provide configuration to enable usage of Off heap merging for groupBy query 2016-02-05 14:18:06 -06:00
Himanshu Gupta b40c342cd1 make Global stupid pool cache size configurable 2016-02-05 14:18:06 -06:00
Himanshu Gupta 72a1e730a2 OffheapIncrementalIndex updates to do the aggregation merging off-heap 2016-02-05 14:17:05 -06:00
Himanshu Gupta 907dd77483 OffheapIncrementalIndex a copy/paste of OnheapIncrementalIndex 2016-02-05 14:02:31 -06:00
Charles Allen aac5f9b2c9 Actually check cache key null checking in ExtractionDimFilterTest 2016-02-04 09:44:13 -08:00
fjy 1aa363cea7 new quickstart 2016-02-04 09:37:38 -08:00
Fangjin Yang da77591129 Merge pull request #2392 from metamx/fix2391
Allow ExtractionDimFilter value to be null
2016-02-03 17:47:14 -08:00
Charles Allen d4f00096ff Allow ExtractionDimFilter value to be null
* Fixes #2391
2016-02-03 15:51:47 -08:00
Himanshu Gupta 6e7d90cf56 UTs for DefaultLimitSpec 2016-02-03 15:59:12 -06:00
Himanshu Gupta 29e0d7f971 lazily create comparators for row columns when needed 2016-02-03 13:38:20 -06:00
navis.ryu 1d602be0f9 Replace string[] with int[] for dimensions 2016-02-03 15:03:22 +09:00
binlijin a5ef30ff84 optimize topn on particular situation 2016-02-02 14:20:09 +08:00
Himanshu 93c50d8538 Merge pull request #2094 from navis/simplify-index-merge
Simplifying dimension merging
2016-01-29 11:23:14 -06:00
navis.ryu 55a888ea2f time-descending result of select queries 2016-01-29 10:06:05 +09:00
navis.ryu dd774ef4dd one-pass merging of dictionary & index 2016-01-29 10:03:53 +09:00
Himanshu edd7ce58aa Merge pull request #2348 from AlexanderSaydakov/fix-aggregator-test-helper
fixed createIndex
2016-01-28 16:01:36 -06:00
saydakov e0860661b1 fixed createIndex 2016-01-28 13:20:50 -08:00
Nishant 99017f4518 Merge pull request #2326 from navis/use-reverse-iterator
use reverse-iterator if possible
2016-01-28 19:48:38 +05:30
Nishant 3880f54b87 Merge pull request #2332 from himanshug/configurable_partial
make populateUncoveredIntervals a configuration in query context
2016-01-28 10:34:35 +05:30
navis.ryu 7324ece8f9 use reverse-iterator if possible 2016-01-28 09:04:55 +09:00
Xavier Léauté 5a3642bb93 Merge pull request #2247 from metamx/pedanticBuild
Enable strict building in travis
2016-01-27 10:27:03 -08:00
Xavier Léauté 2e5004095a Merge pull request #2341 from gianm/smq-test
SegmentMetadataQuery: Fix merging of ColumnAnalysis errors.
2016-01-27 09:37:06 -08:00
Charles Allen 508734c8b0 Long constant reformatting in tests `l` --> `L` 2016-01-27 08:59:19 -08:00
Gian Merlino b1e6c01762 Make LookupExtractor abstract methods public, they have to work across classloaders. 2016-01-26 23:08:03 -08:00
Gian Merlino 795343f7ef SegmentMetadataQuery: Fix merging of ColumnAnalysis errors.
Also add tests for:
- ColumnAnalysis folding
- Mixed mmap/incremental merging
2016-01-26 17:16:26 -08:00
Himanshu Gupta 3719b6e3c8 make populateUncoveredIntervals a configuration in query context 2016-01-26 15:13:45 -06:00
Himanshu 3844658fb5 Merge pull request #2323 from druid-io/update-druidapi
Update druid-api to 0.3.16
2016-01-26 13:02:10 -06:00
Himanshu Gupta 09d3678667 adding single threaded indexing and querying test for IncrementalIndex 2016-01-23 00:17:14 -06:00
Charles Allen 0000b9fc62 Remove sorting in ProtoBufInputRowParserTest
Due to processing/src/test/java/io/druid/data/input/ProtoBufInputRowParserTest.java
2016-01-22 16:02:25 -08:00
Himanshu Gupta 2f7f5119cf older segments might not have field bitmapSerdeFactory for dimension columns and we must use appropriate default 2016-01-22 13:28:25 -06:00
binlijin 1d1f4d996d Merge pull request #2111 from binlijin/optimize-create-inverted-indexes
optimize create inverted indexes
2016-01-22 11:36:27 +08:00
binlijin 55f7dd4629 optimize create inverted indexes 2016-01-22 10:40:09 +08:00
Gian Merlino d416279c14 SegmentMetadataQuery support for returning aggregators. 2016-01-21 17:27:25 -08:00
Fangjin Yang 5a9cd89059 Merge pull request #2305 from gianm/segment-metadata-query-multivalues
Add StorageAdapter#getColumnTypeName, and various SegmentMetadataQuery adjustments
2016-01-21 17:22:34 -08:00
Gian Merlino e5913be90e Merge pull request #2257 from tubemogul/index-merge-bug
Adds support for empty merge metrics. fixes #2256
2016-01-21 16:38:00 -08:00
Gian Merlino 87c8046c6c Add StorageAdapter#getColumnTypeName, and various SegmentMetadataQuery adjustments.
SegmentMetadataQuery stuff:

- Simplify implementation of SegmentAnalyzer.
- Fix type names for realtime complex columns; this used to try to merge a nice type
  name (like "hyperUnique") from mmapped segments with the word "COMPLEX" from incremental
  index segments, leading to a merge failure. Now it always uses the nice name.
- Add hasMultipleValues to ColumnAnalysis.
- Add tests for both mmapped and incremental index segments.
- Update docs to include errorMessage.
2016-01-21 15:50:33 -08:00
Fangjin Yang 3f998117a6 Merge pull request #2306 from jon-wei/inherit2
More specific null/empty str handling in IndexMerger
2016-01-21 14:36:09 -08:00
Michael Schiff 1e44445f06 Adds support for empty merge metrics. fixes #2256 2016-01-21 13:21:37 -08:00
jon-wei 459a236067 More specific null/empty str handling in IndexMerger 2016-01-21 12:24:38 -08:00
Slim 201539260c Merge pull request #2076 from b-slim/issue_2010_upper_lower_extractionFN
adding lower and upper extraction fn
2016-01-21 09:58:07 -06:00
Slim Bouguerra 78feb3a13e adding lower and upper extraction fn 2016-01-21 08:59:05 -06:00
Gian Merlino 5a932d28c1 Merge pull request #2288 from tubemogul/index-merge-bug2
Null check in IncrementalIndexAdapter.getDimValueLookup()
2016-01-20 17:07:15 -08:00
Nishant 59ea186af7 fix reference counting for segments 2016-01-20 17:24:21 +05:30
Michael Schiff 50ceec78a2 null check in IncrementalIndexAdapter.getDimValueLookup() 2016-01-19 23:19:28 -08:00
jon-wei bc1e9b27c8 Consolidate IndexMergerTest and IndexMergerV9Test 2016-01-19 16:28:35 -08:00
jon-wei 747343e621 Preserve dimension order across indexes during ingestion 2016-01-19 13:34:11 -08:00
Fangjin Yang 0c31f007fc Merge pull request #1728 from himanshug/aggregators_in_segment_metadata
Store AggregatorFactory[] in segment metadata
2016-01-19 12:55:49 -08:00
Himanshu Gupta a99aef29a1 adding aggregators to segment metadata 2016-01-19 14:23:39 -06:00
Himanshu Gupta 52eb0f04a7 adding a new method getMergingFactory(..) to AggregatorFactory 2016-01-18 22:03:46 -06:00
Himanshu Gupta 77fc86c015 making AggregatorFactory abstract class 2016-01-18 22:03:46 -06:00
Himanshu Gupta 164b0aad7a removing Map<String,Object> segmentMetadata from methods in Index[Maker/Merger] and using Metadata class
instead of a Map to store segment metadata
2016-01-18 22:03:46 -06:00
zhxiaog 3459a202ce fixed #1873, add ability to express CONCAT as an extractionFn 2016-01-18 15:03:17 -08:00
Keuntae Park 238dd3be3c support cascade execution of extraction filters in extraction dimension spec 2016-01-18 11:10:19 +09:00
Fangjin Yang f6a1a4ae20 Merge pull request #2138 from KurtYoung/feature-build-v9
build v9 directly
2016-01-16 13:35:46 -06:00
Kurt Young 82ff98c2bf add config for build v9 directly and update docs 2016-01-16 11:26:34 +08:00
Kurt Young 1f2168fae5 add IndexMergerV9
add unit tests for IndexMergerV9 and fix some bugs

add more unit tests and fix bugs

handle null values and add more tests

minor changes & use LoggingProgressIndicator in IndexGeneratorReducer

make some static class public from IndexMerger

minor changes and add some comments

changes for comments
2016-01-16 11:25:28 +08:00
Kurt Young bb50d2a2b2 add some streaming writers 2016-01-16 11:25:26 +08:00
Fangjin Yang e0932ba1c2 Merge pull request #2267 from himanshug/fix_topn_multi_val_filter
Remap id's returned in XXXFilteredDimensionSpec.getRow() as per reduced cardinality
2016-01-14 17:06:54 -08:00
Fangjin Yang 7704699b40 Merge pull request #2265 from navis/strlen-dimension-ignored
Strlen sort spec ignores dimension
2016-01-14 17:06:33 -08:00
Himanshu Gupta ae6a111444 fix XXXFilteredDimensionSpec to remap the dictionary encodings as per new cardinality 2016-01-13 22:25:02 -06:00
binlijin a3140b2548 fix topN filtering on multi-valued dimension bug 2016-01-13 22:25:02 -06:00
navis.ryu ea9fabdf2f Strlen sort spec ignores dimension 2016-01-14 11:05:44 +09:00
Fangjin Yang 4c014c1574 Merge pull request #2228 from metamx/incremental-index-mem2
Improve heap usage for IncrementalIndex
2016-01-13 14:48:03 -08:00
navis.ryu 18479bb757 time-descending result of timeseries queries 2016-01-13 12:23:01 +09:00
Fangjin Yang d7ad93debc Merge pull request #2221 from binlijin/topN_minTopNThreshold
Allow change minTopNThreshold per topN query
2016-01-12 16:22:20 -08:00
Nishant 4863e2ca4f cache metric selectors instead of creating new ones for every metric in each row
clear selectors on close.

Add comments about thread safety.
2016-01-13 00:45:23 +05:30
Nishant dfe6abb721 Merge pull request #2250 from himanshug/agg_test_helper_fix
remove redundant registering of json modules in AggregationTestHelper
2016-01-12 11:42:00 +05:30
navis.ryu 976ebc45c0 Simplify information in IncrementalIndex 2016-01-12 10:18:11 +09:00
Himanshu Gupta b973604bf8 remove redundant registering of json modules in AggregationTestHelper 2016-01-11 19:03:22 -06:00
Xavier Léauté 46a7f2660d fix casing to be consistent with other classes 2016-01-08 10:19:06 -08:00
Fangjin Yang d0b10c29d7 Merge pull request #2197 from metamx/clearIncIndexClose
Make OnHeapIncrementalIndex clean maps on close()
2016-01-07 15:43:47 -08:00
Gian Merlino 4ecd901a1a Merge pull request #2219 from himanshug/identity_extraction_fn_singleton
make IdentityExtractionFn singleton
2016-01-07 10:08:28 -08:00
Fangjin Yang aaea95ed1b Merge pull request #2207 from himanshug/theta_sketch_select_query
fix bug for thetaSketch metric not working with select queries
2016-01-07 09:46:09 -08:00
binlijin 010c6e959c add test 2016-01-07 18:01:46 +08:00
binlijin a6bfcc5bfd Allow change minTopNThreshold per topN query 2016-01-07 14:51:00 +08:00
Fangjin Yang 4cc81d3eff Merge pull request #2096 from b-slim/add_use_case_unapply
Add use case unapply
2016-01-06 21:58:12 -08:00
Himanshu Gupta 217079d0c7 make IdentityExtractionFn singleton 2016-01-06 22:29:07 -06:00
Himanshu 902f51433d Merge pull request #2125 from mangeshpardeshiyahoo/master
Add extraction function support for Dimension Selector
2016-01-06 14:22:26 -06:00
Mangesh Pardeshi 75ee952197 Add extraction function support for dimension Selector 2016-01-06 13:47:07 -06:00
Slim Bouguerra 032d3bf6e6 Optimization of extraction filter by reversing the lookup 2016-01-06 11:16:11 -06:00
Himanshu Gupta 3f048f0b15 adding support to execute Select queries in AggregationTestHelper so that Select query based UTs can be written for complex aggregator implementations 2016-01-05 21:54:55 -06:00
Charles Allen 91fc32749b Make OnHeapIncrementalIndex clean maps on close() 2016-01-04 11:18:16 -08:00
Himanshu Gupta b47d807738 Add support for filtering at DimensionSpec level so that multivalued dimensions can be filtered correctly
also adding UTs for multi-valued dimensions
2015-12-30 17:59:47 -06:00
Himanshu Gupta fa5c3bb014 adding decorate(DimensionSelector) to DimensionSpec to enable support for arbitrary filtering/transformations to returned dimension values 2015-12-30 15:06:24 -06:00
Nishant b68265399c Merge pull request #2168 from druid-io/remove-indexmaker
Remove IndexMaker
2015-12-30 12:24:29 +05:30
Fangjin Yang e14ad74088 Merge pull request #1936 from b-slim/between_range_with_predicat
adding Upper/Lower Bound Filter
2015-12-29 10:11:22 -08:00
fjy faf421726b remove IndexMaker 2015-12-28 14:19:02 -08:00
Gian Merlino 83f4130b5f SegmentMetadataQuery merging fixes.
- Fix merging when the INTERVALS analysisType is disabled, and add a test.
- Remove transformFn from CombiningSequence, use MappingSequence instead. transformFn did
  not work for "accumulate" anyway, which made the tests wrong (the intervals should have
  been condensed, but were not).
- Add analysisTypes to the Druids segmentMetadataQuery builder to make testing simpler.
2015-12-22 07:57:10 -08:00
Robin dded4441d3 for completeness, add unit test for groupby/having with unrecognized type 2015-12-21 12:06:56 -06:00
Himanshu Gupta e1631967e3 adding comments to explain merge failure in segmentMetadata query 2015-12-19 11:39:24 -06:00
Himanshu Gupta 7ecad1be24 Fix and UT for testing segment analysis merge 2015-12-19 00:24:02 -06:00
Fangjin Yang 7019d3c421 Merge pull request #2107 from jon-wei/fix_smq
More efficient SegmentMetadataQuery
2015-12-18 16:40:47 -08:00
Fangjin Yang 14229ba0f2 Merge pull request #1922 from metamx/jsonIgnoresFinalFields
Change DefaultObjectMapper to NOT overwrite final fields unless explicitly asked to
2015-12-18 15:38:32 -08:00
Fangjin Yang 71f554bf80 Merge pull request #2101 from himanshug/fix_extraction_dim_filter_cache_key
add extractionFn bytes to cache key in ExtractionDimFilter
2015-12-18 12:05:43 -08:00
Fangjin Yang 9e6874cc7e Merge pull request #2084 from binlijin/master
minor optimize IndexMerger's MMappedIndexRowIterable
2015-12-18 11:42:55 -08:00
Bingkun cc21a5fac7 Merge pull request #1999 from himanshug/remove_min_max_aggs
remove min/max aggregator factory
2015-12-18 13:38:52 -06:00
jon-wei 356b07c6c3 More efficient SegmentMetadataQuery 2015-12-17 12:46:23 -08:00
Jonathan Wei f8cf84f466 Merge pull request #1995 from himanshug/num_rows_seg_metadata_query
add numRows to segment metadata query response
2015-12-17 12:23:46 -08:00
Himanshu Gupta 82ea348003 add extractionFn bytes to cache key in ExtractionDimFilter 2015-12-16 14:00:38 -06:00
Himanshu 628643d80e Merge pull request #2091 from rasahner/noDefaultForGroupbyHaving
take away default for groupBy/having
2015-12-16 01:07:40 -06:00
sahner 3441cf3110 take away default for groupBy/having 2015-12-15 10:32:45 -06:00
Fangjin Yang e7f06cf61c Merge pull request #2075 from jon-wei/regex_extract
Configurable value replacement on match failure for RegexExtractionFn
2015-12-14 19:10:50 -08:00
jon-wei c88f75df7c Configurable value replacement on match failure for RegexExtractionFn 2015-12-14 17:57:41 -08:00
binlijin 362bea1090 minor optimize IndexMerger's MMappedIndexRowIterable 2015-12-11 15:04:46 +08:00
Xavier Léauté d531e69d1a Merge pull request #2079 from binlijin/master
reduce bytearray copy to minimal optimize VSizeIndexedWriter
2015-12-10 21:30:09 -08:00
Slim Bouguerra 77afdf25e3 adding Bound Filter 2015-12-10 08:47:21 -06:00
Slim Bouguerra ee1a39801a adding bulk lookup and reverse lookup 2015-12-10 08:29:41 -06:00
binlijin 0eafbd55b2 reduce bytearray copy to minimal optimize VSizeIndexedWriter 2015-12-10 16:34:39 +08:00
Fangjin Yang f4ba13a1ac Merge pull request #2029 from b-slim/add_reverse_fn
Adding reverse lookup function to LookupExtractor.
2015-12-09 12:50:13 -08:00
Xavier Léauté 9015a68c03 Merge pull request #2002 from navis/DRUID-2001
fixed #2001 GenericIndexed.fromIterable compares all values even when it's not sorted
2015-12-09 08:56:49 -08:00
Slim Bouguerra 85f339b687 introduction and implem of reverse lookup function unApply. 2015-12-09 10:02:57 -06:00
Nishant 6c23d8edb4 Merge pull request #2043 from mangeshpardeshiyahoo/master
Add dimension selector support for groupby/having filters
2015-12-08 12:08:53 +05:30
Mangesh Pardeshi d7ce120929 Add dimension selector support for groupby/having quries 2015-12-08 01:51:11 +00:00
Himanshu Gupta 431469e9c1 remove min/max aggregator factory which are replaced by double[min/max] aggregator factories 2015-12-05 22:36:49 -06:00
Himanshu Gupta 62ba9ade37 unifying license header in all java files 2015-12-05 22:16:23 -06:00
Gian Merlino d21a640695 Merge pull request #2034 from b-slim/fix_cache_key
Fix getCacheKey for DimFilters
2015-12-04 09:13:06 -08:00
Slim Bouguerra fb4ff3cf54 fix getCacheKey 2015-12-04 08:07:08 -06:00
Charles Allen 9d02f47201 Update IncrementalIndexTest copyright notice 2015-12-03 18:03:08 -08:00
Charles Allen be8c6fafb0 Merge pull request #2017 from tubemogul/issue/63
fixes issue #63
2015-12-03 18:01:11 -08:00
Gian Merlino 045df54404 Merge pull request #1961 from metamx/druidMetricsVersion
Add the druid artifact version to metrics when emitted
2015-12-03 17:34:57 -08:00
Michael Schiff b6cc2428e1 fixes issue #63 2015-12-03 17:30:47 -08:00
Himanshu 0eab8417cb Merge pull request #2008 from codingwhatever/regex-search-query
Regex search query
2015-12-03 09:57:34 -06:00
Sam Groth 596b7ebd9a Adding RegexSearchQuerySpec 2015-12-03 09:16:02 -06:00
Himanshu d02be6194d Merge pull request #1967 from metamx/realtime-metrics-improvements
Add datasource and taskId to metrics emitted by peons
2015-12-02 23:48:13 -06:00
Himanshu 00c6027777 Merge pull request #1986 from metamx/substring
fixes #1874 adding a substring extraction function, tests, and documentation
2015-12-02 23:45:47 -06:00
Clint Wylie 68ef5f437a fixes #1874 adding a substring extraction function, tests, and documentation 2015-12-01 23:50:32 -08:00
navis.ryu 87357a0534 fixed #2001 GenericIndexed.fromIterable compares all values even when it's not sorted 2015-12-02 15:11:14 +09:00
Nishant 1eb8211346 Add datasource and taskId to metrics emitted by peons
This PR adds the datasource and taskId to the jvm and sys metrics
emitted by the peons.

fix spelling

review comment

review comment
2015-12-01 23:20:59 +05:30
Gian Merlino cd2cff24ff Fix serde for FragmentSearchQuerySpec and add some tests. 2015-11-30 17:34:35 -08:00
navis.ryu c73418c181 fixed #2003 ColumnSelectorBitmapIndexSelector throws NPE for dimension not supporting bitmap 2015-11-24 10:45:36 +09:00
Himanshu Gupta 7a89b2e1a6 add numRows to segment metadata query response 2015-11-20 01:25:02 -06:00
Himanshu d93640bfcb Merge pull request #1974 from jon-wei/dim_order_merge
Allow IndexMerger to use non-lexicographic dim order when merging indexes
2015-11-18 19:51:34 -06:00
Xavier Léauté e3e6159336 Merge pull request #1985 from metamx/FixLookupCacheKey
Change LookupExtractionFn cache key to be unique
2015-11-18 10:13:55 -08:00
Charles Allen 7abe999418 Change LookupExtractionFn cache key to be unique 2015-11-17 18:02:40 -08:00
jon-wei 4afc62be29 Allow IndexMerger to use non-lexicographic dim order when merging indexes 2015-11-17 13:02:31 -08:00
Xavier Léauté d7eb2f717e enable query caching on intermediate realtime persists 2015-11-17 10:58:00 -08:00
Gian Merlino 57f213d536 Better toString for groupBy, segmentMetadata queries. 2015-11-16 12:54:59 -08:00
jon-wei cdceaf2d26 Fix IncrementalIndexAdapter getRows() Iterable 2015-11-12 13:10:42 -08:00
Charles Allen af34e9c8cb Add the druid artifact version to metrics when emitted 2015-11-12 12:11:27 -08:00
binlijin 286b8f8c6f optimize index merge 2015-11-12 11:08:54 +08:00
Xavier Léauté fa6142e217 cleanup and remove unused imports 2015-11-11 12:25:21 -08:00
dclim fd0935ecb9 fix spatial dimension transformer to work with hadoop 2015-11-10 19:16:51 -07:00
Slim Bouguerra c511273efd adding in filter 2015-11-06 16:23:24 -06:00
Charles Allen 929b981710 Change DefaultObjectMapper to NOT overwrite final fields unless explicitly asked to 2015-11-05 18:10:13 -08:00
fjy 8f231fd3e3 cleanup druid codebase 2015-11-04 13:59:53 -08:00
Gian Merlino 8defe29270 Merge pull request #1901 from guobingkun/fix_typo_and_rename
Fix metadata typo and rename default extension directory
2015-11-03 14:02:11 -08:00
Bingkun Guo 962f65cc76 fix metadata typo and rename default extension directory 2015-11-03 14:50:42 -06:00
Fangjin Yang cec09a9967 Merge pull request #1804 from himanshug/objectify_index_creators
static to non-static conversion for methods in Index[Merger/Maker/IO]
2015-11-03 11:25:32 -08:00
Himanshu Gupta 8b67417ac8 make methods in Index[Merger,Maker,IO] non-static so that they can have
appropriate ObjectMapper injected instead of creating one statically
2015-11-02 23:24:26 -06:00
navis.ryu e03fc2032f changed equals/hashCode implementation 2015-11-02 17:21:35 +09:00
navis.ryu 69c86716d6 addressed comments 2015-11-02 14:23:13 +09:00
navis.ryu 032c3e986d Make 'search' filter have a case sensitive option(#1878) 2015-10-30 16:38:54 +09:00
Fangjin Yang 25a0eb7ed5 Merge pull request #1799 from dclim/nested-groupby-aggregator-fix
Support multiple outer aggregators of same type and provide more help…
2015-10-29 18:01:31 -07:00
Xavier Léauté 59872bd0cd Merge pull request #1809 from metamx/fifoPriorityExecutorService
Make PrioritizedExecutorService optionally FIFO
2015-10-27 15:19:32 -07:00
Charles Allen 060402a216 Merge pull request #1855 from himanshug/fix_having_specs
fix [GreaterThan,LessThan,Equals] HavingSpecs
2015-10-27 14:46:04 -07:00
Charles Allen ecdafa87c5 Make PrioritizedExecutorService optionally FIFO 2015-10-27 14:16:22 -07:00
Himanshu Gupta a71c7270b9 making [GreaterThan,LessThan,Equals] HavingSpecs more robust by carefully using long vs float for comparison 2015-10-27 13:15:13 -05:00
Fangjin Yang 5a082b2f5e Merge pull request #1824 from metamx/UniformGranularitySpecHashEquals
Add hashCode and equals to UniformGranularitySpec
2015-10-26 09:34:01 -07:00
Fangjin Yang 5f23703216 Merge pull request #1638 from guobingkun/remove_maven_client_code
Remove Maven client at runtime + Provide a way to load Druid extensions through local file system
2015-10-26 09:30:05 -07:00
Nishant 7cecc55045 Add segment merge time as a metric
Add merge and persist cpu time

Fix typo

review comment

move cpu time measuring to VMUtils

review comments.
2015-10-22 12:28:03 +05:30
Bingkun Guo 4914925d65 New extension loading mechanism
1) Remove maven client from downloading extensions at runtime.
2) Provide a way to load Druid extensions and hadoop dependencies through file system.
3) Refactor pull-deps so that it can download extensions into extension directories.
4) Add documents on how to use this new extension loading mechanism.
5) Change the way how Druid tarball is generated. Now all the extensions + hadoop-client 2.3.0
are packaged within the Druid tarball.
2015-10-21 14:22:36 -05:00
Xavier Léauté e4ac78e43d bump next snapshot to 0.9.0 2015-10-20 13:46:13 -07:00
dclim 46ecdfa757 add comment explaining logic 2015-10-15 16:04:06 -06:00
Xavier Léauté 4c2c7a2c37 update version to 0.8.3 2015-10-14 21:40:55 -07:00
Charles Allen f432b8e3f9 Add hashCode and equals to UniformGranularitySpec
* Also add hashCode != 0 to AllGranularity and NoneGranularity
2015-10-13 16:42:21 -07:00
Gian Merlino c9d6994040 Merge pull request #1821 from himanshug/storage_adapter_update
cache max data timestamp in QueryableIndexStorageAdapter
2015-10-13 10:52:43 -07:00
Himanshu Gupta 490de1f98a support multiple non-consecutive intervals in outer query of nested group-by 2015-10-13 10:16:06 -05:00
Himanshu Gupta fbba30eb60 cache max data timestamp in QueryableIndexStorageAdapter so that TimestampCheckingOffset
does not have to get it per cursor.
2015-10-12 15:34:22 -05:00
Charles Allen 8ed5d2c06a Add hashCode and equals to stock lookups 2015-10-12 10:29:39 -07:00
Himanshu Gupta 2737fd83f5 in the IndexSizeExceededException put maxRowCount to confirm if it is correctly picked up from configuration 2015-10-06 15:23:14 -05:00
Himanshu Gupta 8654732ef6 make IndexSizeExceededException constructor take formatString and arguments than just fixed String
like ISE, IAE etc
2015-10-06 13:44:22 -05:00
dclim f4e0a76820 Support multiple outer aggregators of same type and provide more helpful exception when the same inner aggregator is referenced by multiple types of outer aggregators 2015-10-01 15:15:12 -06:00
Gian Merlino 774765dc40 GroupByQueryRunnerTest for hyperUnique finalizing post aggregators 2015-10-01 00:09:29 -04:00
Gian Merlino e3bb93e8c7 Revert "Merge pull request #1781 from dclim/nested-groupby-multiple-same-aggregator-fix-v2"
This reverts commit dae488b7c0, reversing
changes made to 397be4b897.
2015-10-01 00:05:59 -04:00
dclim 8e20a1e1f3 Use DoubleSumAggregatorFactory instead of CountAggregatorFactory, add test for non-integers 2015-09-30 17:11:39 -06:00
David Lim 70ae5ca922 Fix failure in nested groupBy with multiple aggregators with same fieldName
Version 2 - Throws an exception if an outer query references an
aggregator that doesn't exist in the inner query, and then uses the
inner query aggregator names to form the columns for the intermediate
incremental index.

Also deleted all the getRequiredColumns() methods which are no longer
being used.

We do something wacky by adding an aggregator factory for the post
aggregators when building the intermediate incremental index, otherwise
queries on post aggregate results fail because the data isn't in the
incremental index.

Closes #1419
2015-09-30 15:43:11 -06:00
Charles Allen 8199ecf1a4 Merge pull request #1782 from jon-wei/smq_cachekey
Add analysisTypes to SegmentMetadataQuery cache key
2015-09-29 15:51:35 -07:00
jon-wei 41ff271339 Add analysisTypes to SegmentMetadataQuery cache key 2015-09-29 14:33:35 -07:00
Charles Allen 2d847ad654 Merge pull request #1730 from metamx/union-queries-fix
fix #1727 - Union bySegment queries fix
2015-09-29 12:23:25 -07:00
Nishant 573aa96bd6 fix #1727 - Union bySegment queries fix
Fixes #1727.
revert to doing merging for results for union queries on broker.

revert unrelated changes

Add test for union query runner

Add test

remove unused imports

fix imports

fix renamed file

fix test

update docs.
2015-09-29 23:32:36 +05:30
Gian Merlino 62d4ced4dd Separate ListColumnIncluderator cache key parts with nul bytes 2015-09-29 13:59:58 -04:00
jon-wei e6a6284ebd Allow SegmentMetadataQuery to skip cardinality and size calculations 2015-09-22 13:51:55 -07:00
Gian Merlino aaa8a88464 Merge pull request #1739 from jon-wei/segment_realtime
Allow SegmentAnalyzer to read columns from StorageAdapter, allow SegmentMetadataQuery to query IncrementalIndexSegments on realtime node
2015-09-17 18:36:53 -07:00
Charles Allen df4c2bab10 Soften concurrency requirements on IncrementalIndexTest 2015-09-17 15:51:07 -07:00
jon-wei 367c50d4ba Allow SegmentAnalyzer to read columns from StorageAdapter, allow SegmentMetadataQuery to query IncrementalIndexSegments on realtime node 2015-09-16 18:39:31 -07:00
Charles Allen 6e1eb3b7fe Add better concurrency testing to IncrementalIndexTest 2015-09-16 14:04:20 -07:00
Gian Merlino 9705c5139b Merge pull request #1732 from jon-wei/segmentmeta
Add support for a configurable default segment history period for segmentMetadata queries and GET /datasources/<datasourceName> lookups
2015-09-16 12:36:25 -07:00
Fangjin Yang 8b071a7230 Merge pull request #1710 from metamx/incrementalIndexConcurrentTestLatching
Add some basic latching to concurrency testing in IncrementalIndexTest
2015-09-15 13:55:52 -07:00
jon-wei 193fb4fdfc Add support for a configurable default segment history period for segmentMetadata queries and GET /datasources/<datasourceName> lookups 2015-09-14 19:41:42 -07:00
Charles Allen bd605a097e Merge pull request #1731 from metamx/regex-extraction-npe
fix NPE with regex extraction function
2015-09-14 15:55:05 -07:00
Xavier Léauté 08a527d01a fix NPE with regex extraction function 2015-09-14 14:45:30 -07:00
Charles Allen e569f4b6a7 Add dimension extraction functionality to SearchQuery
* Add IdentityExtractionFn
2015-09-14 11:36:15 -07:00
Himanshu 5ff92664f8 Merge pull request #1696 from metamx/cpuTimeReporting
Add CPU time to metrics for segment scanning.
2015-09-14 10:53:55 -05:00
Fangjin Yang 34ef81572d Merge pull request #1700 from himanshug/update_agg_test_helper
update indexing in the helper to use multiple persists and merge
2015-09-14 06:56:29 -07:00
Charles Allen 8d3cdd8572 Don't check for sortedness if we already know GenericIndexedWriter isn't sorted 2015-09-11 16:32:09 -07:00
Charles Allen d6849805ea Add some basic latching to concurrency testing in IncrementalIndexTest 2015-09-10 10:06:51 -07:00
Himanshu Gupta 5da58e48e0 use Rule based TemporaryFolder for cleanup of temp directory/files 2015-09-09 11:10:33 -05:00
Himanshu Gupta 44911039c5 update indexing in the helper to use multiple persists and final merge to
catch further issues in aggregator implementations
2015-09-09 11:10:33 -05:00
Charles Allen fcf5cae81d Add CPU time to metrics for segment scanning. 2015-09-08 13:34:19 -07:00
cheddar 4f61b42f40 Merge pull request #1578 from b-slim/fix_extraction_filter_2
Fix UT and documentation to the extraction filter
2015-09-01 10:46:20 -07:00
Himanshu 04ff6cd355 Merge pull request #1685 from gianm/close-loudly
Close output streams and channels loudly when creating segments.
2015-08-28 23:32:22 -05:00
Gian Merlino 940e1aa3eb Replace funky imports with standard ones.
1) Lots of Guava imports were not coming from the actual Guava
2) junit.framework.Assert should be org.junit.Assert
2015-08-28 18:02:05 -07:00
Gian Merlino 7d6fa2ba50 Close output streams and channels loudly when creating segments. 2015-08-28 17:14:03 -07:00
Himanshu Gupta 2e0dd1d792 adding UTs and addressing review comments to
firehoseV2 addition to Realtime[Manager|Plumber],
essential segment metadata persist support,
kafka-simple-consumer-firehose extension patch
2015-08-27 20:50:46 -05:00
lvjq 2237a8cf0f kafka 8 simple consumer firehose 2015-08-27 20:50:46 -05:00
Charles Allen c1388a1685 Merge pull request #1632 from Hailei/fix-subquery-innerquery-demension
Inner Query  should build on sub query
2015-08-27 10:25:38 -07:00
Gian Merlino 2a866f49df Downgrade Jackson to 2.4.6. 2015-08-26 18:25:55 -07:00
Charles Allen 24aa762c79 Add test for #1632 2015-08-25 20:50:30 -07:00
Xavier Léauté 51f6a9a2c9 update jackson to 2.6.1 2015-08-25 16:07:01 -07:00
Himanshu Gupta c57c07f28a add ability for client code to provide InputStream of input data in addition to File
It would be needed when input data file does not reside in the same jar
but you could still use getResourceAsStream() to read the data inside a file
2015-08-20 00:54:58 -05:00
Xavier Léauté 3b2e41e42a update for next release 2015-08-18 17:16:46 -07:00
Slim Bouguerra 7549f02578 support the case filter value is null 2015-08-17 15:09:37 -05:00
zhanghailei 234a958817 Inner Query should build on sub query 2015-08-17 18:18:26 +08:00
Charles Allen db19d2d547 Revert "Update to guice 4.0" 2015-08-14 09:26:07 -07:00
Charles Allen be89105621 Merge pull request #1602 from metamx/more-code-cleanup
Some perf Improvements in Broker
2015-08-11 13:51:49 -07:00
Xavier Léauté fbdb841928 Merge pull request #1603 from metamx/optimize-lexicographic-topN
Optimizations for LexicographicTopNs
2015-08-11 13:35:34 -07:00
Nishant b8d8a8da9e Optimisations for LexicographicTopNs
initial review for perf optimizations for lexicographic TopNs

fix compilation

create map with proper size

review comment

review comment

review comments
2015-08-12 00:37:48 +05:30
Charles Allen 7e61216287 Update to guice 4.0
- Mark a lot of `@Provides` methods as final since guice 4.0 disallows overriding them
2015-08-10 13:57:18 -07:00
Slim Bouguerra f0bc362981 clean code if is not needed anymore 2015-08-07 12:38:41 -05:00
Slim Bouguerra 64d638a386 optimize makeMatcher 2015-08-06 17:04:36 -05:00
Nishant 1a46c4c71c avoid creating mergeSeqence when not required 2015-08-06 14:25:13 +05:30
Slim Bouguerra 83de5a4716 addressing reviewers comments 2015-08-03 09:03:28 -05:00
Slim Bouguerra dda0790a60 Fix extractionFilter by implementing make matcher
Fix getBitmapIndex to consider the case were dim is null
Unit Test for exractionFn with empty result and null_column
UT for TopN queries with Extraction filter
refactor in Extractiuon fileter makematcher for realtime segment and clean code in b/processing/src/test/java/io/druid/query/groupby/GroupByQueryRunnerTest.java
fix to make sure that empty string are converted to null
2015-08-03 09:02:17 -05:00
Himanshu Gupta d11d9b6c45 dont waste memory in storing all lines from input
CharSource.readLines() reads all lines from input into a in-memory list
Since we need an iterator here, so this wastage can be easily prevented
2015-07-20 21:59:38 -05:00
Fangjin Yang 0481c8ca26 Merge pull request #1406 from zhaown/fix-breaking-while-exceeding-max-intermediate-rows
Fix breaking while exceeding max intermediate rows.
2015-07-20 13:41:22 -07:00
Himanshu Gupta f7a92db332 generic byte[] serde for InputRow 2015-07-20 12:01:53 -05:00
Himanshu Gupta 0439e8ec23 adding serde methods for intermediate aggregation object to ComplexMetricSerde
This provides the alternative to using ComplexMetricSerde.getObjectStrategy()
and using the serde methods from ObjectStrategy as that usage pattern is deprecated.
2015-07-20 12:01:53 -05:00
zhaown 524b05f073 Fix breaking while exceeding max intermediate rows. 2015-07-19 10:41:53 +08:00
Fangjin Yang e21195f987 Merge pull request #1469 from guobingkun/table_config
Inconsistent property names for "druid.metadata.storage.tables.xxx"
2015-07-17 07:43:19 -07:00
Himanshu 19af3bc9bc Merge pull request #1535 from metamx/alphanum-docs-tests
Update alphanumeric sort docs + more tests / examples
2015-07-16 22:09:41 -05:00
Xavier Léauté 2c464ad936 correct reference in docs + more tests / examples 2015-07-16 19:50:05 -07:00
Xavier Léauté 9616c10b1d remove import static 2015-07-16 17:46:21 -07:00
Xavier Léauté c1308203b8 Merge pull request #1532 from metamx/fixTopNDimExtractionDoubleApply
Fix TopN dimension extractions being applied twice
2015-07-16 13:39:02 -07:00
Xavier Léauté 3a0793aaf9 Merge pull request #1533 from metamx/extraCheckGroupByDimExtraction
Add more unit tests for group by
2015-07-15 21:09:00 -07:00
Charles Allen 7d0b77c261 Add more unit tests for group by 2015-07-15 20:15:21 -07:00
Xavier Léauté a15a2c4047 fix histogram aggregator cache key 2015-07-15 17:33:36 -07:00
Charles Allen 9092c665b7 Fix TopN dimension extractions being applied twice 2015-07-15 16:58:15 -07:00
Charles Allen 456ad9ffba Merge pull request #1529 from metamx/update-versions
inrement version
2015-07-15 13:25:31 -07:00
Xavier Léauté 4cfb00bc8a inrement version 2015-07-15 13:09:05 -07:00
Charles Allen 5eadd395e2 Move lots of executor service creation to Execs 2015-07-14 15:38:49 -07:00
Nishant 184b12bee8 fix groupBy caching to work with renamed aggregators
Issue - while storing results in cache we store the event map which
contains aggregator names mapped to values. Now when someone fire same
query after renaming aggs, the cache key will be same but the event
will contain metric values mapped to older names which leads to wrong
results.
Fix - modify cache to not store raw event but the actual list of values
only.

review comments + fix dimension renaming

review comment
2015-07-09 11:48:26 +05:30
Xavier Léauté 9789417612 ModuleList is already part of Initialization 2015-07-01 11:37:40 -07:00
Xavier Léauté 2c463ae435 Merge pull request #1489 from metamx/moveTestPackages
Move some test packages
2015-07-01 11:18:09 -07:00
Charles Allen 5e19a615f1 Add coments to DimExtractionTopNAlgorithm 2015-07-01 10:32:45 -07:00
Charles Allen 7a2a8a3d6e Move extraction tests to more reasonable package 2015-07-01 10:30:50 -07:00
Bingkun Guo 4a0ae7d8d5 Fix inconsistent druid property names for "druid.metadata.storage.tables.xxx" between document and code 2015-06-29 10:12:30 -05:00
Xavier Léauté 28fa1642b9 add node time metrics to DirectDruidClient 2015-06-26 17:57:44 -07:00
Xavier Léauté 36b4453789 Merge pull request #1455 from druid-io/fix-protobuf
Fix protobuf impl and docs
2015-06-22 23:15:40 -07:00
nishant f9cdb0ad61 test for #1120
Make the changes described in #1120 to add test for the issue described
there.
2015-06-21 23:34:21 +05:30
fjy 9c74993559 fix protobuf impl and docs 2015-06-20 21:59:38 -07:00
Xavier Léauté 0a5bb909a2 [maven-release-plugin] prepare for next development iteration 2015-06-18 17:35:19 -07:00
Xavier Léauté 59c6b2b279 [maven-release-plugin] prepare release druid-0.8.0-rc1 2015-06-18 17:35:14 -07:00
Charles Allen 6230ac90ae Use IndexMerger for conversion 2015-06-10 11:34:58 -07:00
Xavier Léauté 395ba79f8b Merge pull request #1403 from metamx/mergerMakerTests
Improvements around resource handling in IndexMerger / IndexIO / QueryableIndex
2015-06-04 15:59:10 -07:00
Charles Allen ed8eb5c991 Improvements around resource handling in IndexMerger / IndexIO / QueryableIndex
* Fix resource leak in `io.druid.segment.IndexIO.DefaultIndexIOHandler#validateTwoSegments(java.io.File, java.io.File)`
* Un-deprecate `close()` in `QueryableIndex` and make it inherit `Closeable`
* Fix resource leaks in various unit tests
* Add `CloserRule` for closing out resources
2015-06-04 14:18:27 -07:00
Himanshu 50ad0e6474 Merge pull request #1412 from pjain1/alphaNumericTopN_NPE_fix
NPE fix for TopN query with alphaNumericTopN metric spec
2015-06-04 09:49:31 -05:00
Parag Jain a7b09e857c NPE fix for alphaNumericTopN when pervious stop is not specified 2015-06-04 09:30:31 -05:00
Xavier Léauté 35e2fde18e Merge pull request #1386 from himanshug/aggregation_testing1
General class for testing any Aggregation Implementation
2015-06-03 23:43:36 -07:00
Xavier Léauté 92d7316ed8 Merge pull request #1414 from metamx/timeout2TIMEOUT
Replace "timeout" with QueryContextKeys.TIMEOUT
2015-06-02 17:11:09 -07:00
Charles Allen 1c4d42bc15 Replace "timeout" with QueryContextKeys.TIMEOUT 2015-06-02 14:49:21 -07:00
Charles Allen f48db09e35 Add optimizations for ExtractionFn by enabling MANY_TO_ONE vs ONE_TO_ONE codepaths
* Also adds LookupExtractionFn and MapLookupExtractor which takes in an explicit mapping of renames
* Add injective to javascript extraction fn
2015-06-02 12:22:56 -07:00
Himanshu Gupta 215c1ab01e UTs for hyperUnique aggregation 2015-06-01 12:52:40 -05:00
Himanshu Gupta 160d5fe6b7 a general class for testing any [complex] aggregation implementation 2015-06-01 12:52:40 -05:00
Charles Allen 55292bba13 Add more IndexMergerTests 2015-05-28 18:18:20 -07:00
Charles Allen 1ebe622c7d Add checkin GroupByQuery for null DimensionSpec in dimension list 2015-05-28 14:55:34 -07:00
Xavier Léauté f9c624c7db Merge pull request #1361 from mrijke/groupby-limithavingorder-unittest
GroupBy Query with Having/Limit/Orderingspec inconsistencies (UnitTest)
2015-05-27 14:49:18 -07:00
Xavier Léauté 1a3f04f0ed Merge pull request #1354 from metamx/multi-valued-dimension-compression
Enabling compression for multiValued dimension
2015-05-26 23:43:53 -07:00
Charles Allen fd64c24e43 Fix roaring extraction filter on empty values 2015-05-26 13:54:18 -07:00
nishant 81415282aa Enabling compression for multiValued dimension
Add test and refactoring

Add benchmark tests
2015-05-27 00:09:14 +05:30
Charles Allen e97d22a10a Fix Extraction Filter cast problems for empty results 2015-05-22 15:20:11 -07:00
Charles Allen e1399b7ce4 Add unit test to show breaking Dimension Extraction Filter 2015-05-22 15:02:11 -07:00
Xavier Léauté 75c092ccb1 Merge pull request #1375 from metamx/MetricManipulatorFnInstances
Modify MetricManipulatorFns to use instanced classes
2015-05-22 15:56:47 -04:00
Charles Allen 042653ebcb Modify MetricManipulatorFns to use instanced classes 2015-05-22 12:38:38 -07:00
Himanshu Gupta 723df735e9 force eagerness of processing of SegmentMetadata queries on the processing executor by converting the Sequence into List 2015-05-22 13:46:26 -05:00
Himanshu Gupta 5852b64852 adding UT for SegmentMetadata bySegment query which catches following regression caused by commit 55ebf0cfdf
it fails when we issue the SegmentMetadataQuery by setting {"bySegment" : true} in context with exception -
java.lang.ClassCastException: io.druid.query.Result cannot be cast to io.druid.query.metadata.metadata.SegmentAnalysis
at io.druid.query.metadata.SegmentMetadataQueryQueryToolChest$4.compare(SegmentMetadataQueryQueryToolChest.java:222) ~[druid-processing-0.7.3-SNAPSHOT.jar:0.7.3-SNAPSHOT]
at com.google.common.collect.NullsFirstOrdering.compare(NullsFirstOrdering.java:44) ~[guava-16.0.1.jar:?]
at com.metamx.common.guava.MergeIterator$1.compare(MergeIterator.java:46) ~[java-util-0.27.0.jar:?]
at com.metamx.common.guava.MergeIterator$1.compare(MergeIterator.java:42) ~[java-util-0.27.0.jar:?]
at java.util.PriorityQueue.siftUpUsingComparator(PriorityQueue.java:649) ~[?:1.7.0_80]
2015-05-22 13:45:54 -05:00
Himanshu Gupta da0cc32bc8 Revert commit 55ebf0cfdf
which caused following regression
 it fails when we issue the SegmentMetadataQuery by setting {"bySegment" : true} in context with exception -
java.lang.ClassCastException: io.druid.query.Result cannot be cast to io.druid.query.metadata.metadata.SegmentAnalysis
at io.druid.query.metadata.SegmentMetadataQueryQueryToolChest$4.compare(SegmentMetadataQueryQueryToolChest.java:222) ~[druid-processing-0.7.3-SNAPSHOT.jar:0.7.3-SNAPSHOT]
at com.google.common.collect.NullsFirstOrdering.compare(NullsFirstOrdering.java:44) ~[guava-16.0.1.jar:?]
at com.metamx.common.guava.MergeIterator$1.compare(MergeIterator.java:46) ~[java-util-0.27.0.jar:?]
at com.metamx.common.guava.MergeIterator$1.compare(MergeIterator.java:42) ~[java-util-0.27.0.jar:?]
at java.util.PriorityQueue.siftUpUsingComparator(PriorityQueue.java:649) ~[?:1.7.0_80]
2015-05-22 13:39:34 -05:00
Maarten Rijke 82da479464 Fix for GroupBy with Having+Limit+Orderspec
* Inverted function arguments to compose postProcFn for GroupBy queries
    with havingspec + limitspec.
  * Replaced query.getLimitSpec() with null in GroupByQueryToolChest's
    mergeGroupByResults
  * Added unittest to verify functionality
2015-05-19 18:35:48 +02:00
Himanshu Gupta 2fd3e9e8e5 return size = 0 in ColumnAnalysis if its unknown
that is if complex agg did not implement inputSizeFn() so
that segment metadata query shows atleast some information.
also instead of COMPLEX, return type of data stored.
2015-05-15 20:11:56 -05:00
Xavier Léauté 3c3db7229c Merge pull request #1355 from himanshug/long_max_min_aggregators
Long max/min aggregators
2015-05-13 12:08:11 -07:00
Himanshu Gupta cebb550796 additional UTs for [DoubleMax/DoubleMin] aggregation 2015-05-13 09:25:41 -05:00
Himanshu Gupta d0ec945129 adding aliases doubleMax and doubleMin for max and min respectively
renamed all [Max/Min]*.java to [DoubleMax/DoubleMin]*.java and created [Max/Min]AggregatorFactory.java which can be removed when we dont need the min/max aggregator type backward compatibility
2015-05-13 09:25:41 -05:00
Himanshu Gupta 2de38f7d29 UTs for long[Max/Min] aggregation 2015-05-13 09:25:22 -05:00
Himanshu Gupta 00436f93e2 long max/min aggregators implementation 2015-05-13 09:25:22 -05:00
fjy 7a6acf5c1b update pom to 0.8 2015-05-11 19:41:58 -06:00
Xavier Léauté 33265d63e1 Merge pull request #1262 from metamx/fix-null-dimension
fix handling of dimension having only null values
2015-05-06 13:51:26 -07:00
nishant 34be1e96fa fix NPE
review comments

Add test

fix test for java8
2015-05-05 23:11:13 +05:30
Neo 8f8400e24e fix handling of dimension having only null values
fixes #1211

fix value matcher

more improvements

more fixes for partial null column

fix handling of dimension having only null values

fixes #1211

fix value matcher

more improvements

more fixes for partial null column

review comment

IndexMaker speedups
* About 15% speedup

Conflicts:
	processing/src/main/java/io/druid/segment/IndexMaker.java

fix handling of dimension having only null values

fixes #1211

fix value matcher

more improvements

more fixes for partial null column

fix handling of dimension having only null values

fixes #1211

fix value matcher

more improvements

more fixes for partial null column

review comment

review comments

review comment

fix failing tests

review comment

fix compilation
2015-05-04 22:07:45 +05:30
nishant 50158357ff fixes #1330
fixes #1330,
Avoid creating Period instance as creating a Period from Long.MAX_VALUE
throws arithmetic exception.
After this query metric will emit duration in seconds instead of
minutes.
2015-05-04 20:34:28 +05:30
Xavier Léauté 721505c017 Merge pull request #1208 from druid-io/rework-metrics
Schemaless metrics + additional metrics for things we care about
2015-04-27 15:04:54 -07:00
fjy 963e5765bf Schemaless metrics + additional metrics for things we care about 2015-04-27 13:39:40 -07:00
Charles Allen 27016c0289 Fix IndexIO segment validator to account for timestamp mismatches. 2015-04-27 12:42:16 -07:00
Charles Allen 633fdb029e Add option to ConvertSegmentTask to skip validation
* Validation is enabled by default
2015-04-27 08:37:55 -07:00
Charles Allen 303727e6a9 IndexMaker speedups
* About 15% speedup

Conflicts:
	processing/src/main/java/io/druid/segment/IndexMaker.java
2015-04-23 13:19:21 -07:00
Charles Allen f2300430d1 Cleanup some code in index creation.
* Add some unit tests
* Add io.druid.segment.IndexMerger.reprocess for quick re-indexing of data
* Add dim-value validation to validation checker (instead of ONLY index #)
* General code refactoring to make things a little easier to read
2015-04-23 12:41:42 -07:00
Xavier Léauté 7939f43681 Merge pull request #1296 from druid-io/limit-test
Add test for order by metric and limit across multiple days
2015-04-22 11:28:06 -07:00
fjy 97d87a06d0 Add another test for limit across multiple days 2015-04-22 11:27:37 -07:00
Fangjin Yang 28f69d6bd3 Merge pull request #1299 from metamx/improve-filter-datasource-metadata
Improve filtering of segments for dataSourceMetadataQuery
2015-04-22 11:07:35 -07:00
Xavier Léauté a0a28de551 fix serde issue when pulling timestamps from cache 2015-04-22 11:03:26 -07:00
Xavier Léauté 2b4406671e Merge pull request #1301 from druid-io/fix-type
fix count agg factory type
2015-04-21 09:24:20 -07:00
fjy 7805357ab1 fix count agg factory type 2015-04-21 09:23:04 -07:00
nishant bb8c0cb50b Improve filtering of segments for dataSourceMetadataQuery
dataSourceMetadataQuery only needs to be executed on latest segments at
present, modify filterSegments and add test.
2015-04-21 09:31:13 +05:30
Xavier Léauté f73f14ab91 Merge pull request #1297 from metamx/versionConverterTaskUpdates
Update VersionConverterTask for IndexSpec and allowing Forced updates
2015-04-20 16:44:35 -07:00
Charles Allen 7479ac9012 Update VersionConverterTask for IndexSepc and allowing Forced updates 2015-04-20 16:17:06 -07:00
fjy d260515a43 update druid-api version 2015-04-17 14:58:35 -07:00
Bingkun Guo cf155e4eba Fix an issue that after broker forwards GroupByQuery to historical, havingSpec is still applied
on postAggregations which are removed in the forwarded query.

Add a unit test to replicate the issue.
Add a query that can replicate this issue into integration test.
2015-04-17 13:00:41 -05:00
fjy f0a19349bf fix up some comments for contributed test 2015-04-16 15:07:09 -07:00
Fangjin Yang 90b17a5259 Merge pull request #1285 from venkateshk/limitspec-tests
Unit test to surface bug with limit-spec order by over specific query intervals
2015-04-16 13:52:58 -07:00
Xavier Léauté 1d153674b6 remove overzealous check for backwards compatibility 2015-04-15 22:11:55 -07:00
Xavier Léauté ea5572d001 Merge pull request #1271 from metamx/strictErrorChecking
Add stricter checking for potential coding errors
2015-04-15 15:21:41 -07:00
Charles Allen abdeaa0746 Add stricter checking for potential coding errors
Can use via `mvn clean compile test-compile -P strict'
2015-04-15 14:52:25 -07:00
vkavuluri a2ba5b6183 Unit test to surface bug with limit-spec order by over specific query intervals 2015-04-15 06:31:22 -07:00
Xavier Léauté 3a3046ccf3 add support for dimension compression
- compression for single-value dimensions using CompressedVSizeIntsIndexedSupplier
- makes dimension compression configurable via IndexSpec
- IndexSpec also enables configuring bitmap and metric compression
2015-04-14 10:44:18 -07:00
Xavier Léauté bafc5114b4 add toString, equals, and hashCode to BitmapSerdeFactory 2015-04-14 10:44:18 -07:00
Xavier Léauté d20128b89b add compressed variable-size ints column type 2015-04-14 10:44:18 -07:00
Xavier Léauté ce928d9636 add compressed ints column type 2015-04-14 10:44:17 -07:00
Xavier Léauté 5c23679238 add WritableSupplier and IndexedMultivalue 2015-04-14 10:44:17 -07:00
Xavier Léauté 1abb9cce7c make IndexedInts closeable + add fill method 2015-04-14 10:44:17 -07:00
Xavier Léauté ed0d49933e fix memory leak in CompressedXXXIndexedSupplierTest 2015-04-14 10:44:16 -07:00
Xavier Léauté 6790e6cf0f add fromList to CompressedLongsIndexedSupplier 2015-04-14 10:44:16 -07:00
Eric Tschetter 7517f0d0f0 Add some javadoc to the two Query processing interfaces to help aid in implementations of new Queries.
Also, remove some comments that did not have enough context to actually make sense to anyone but the original author (at least, I hope they make sense to the author, I definitely don't know what was being said).
2015-04-09 18:11:42 -07:00
Fangjin Yang 208e307915 Merge pull request #1251 from metamx/uriSegmentLoaders
Revert "Revert "Overhaul of SegmentPullers to add consistency and retries""
2015-03-30 17:43:51 -07:00
fjy aea7f9d192 [maven-release-plugin] prepare for next development iteration 2015-03-30 16:35:24 -07:00
fjy 060d7aef03 [maven-release-plugin] prepare release druid-0.7.1 2015-03-30 16:35:20 -07:00
Charles Allen 1c6cbea89c Revert "Revert "Overhaul of SegmentPullers to add consistency and retries""
This reverts commit f904bc7858.
2015-03-30 13:40:04 -07:00
Fangjin Yang f904bc7858 Revert "Overhaul of SegmentPullers to add consistency and retries" 2015-03-30 13:15:50 -07:00
Charles Allen 6d407e8677 Add URI handling to SegmentPullers
* Requires https://github.com/druid-io/druid-api/pull/37
* Requires https://github.com/metamx/java-util/pull/22
* Moves the puller logic to use a more standard workflow going through java-util helpers instead of re-writing the handlers for each impl
  * General workflow goes like this: 1) LoadSpec makes sure the correct Puller is called with the correct parameters. 2) The Puller sets up general information like how to make an InputStream, how to find a file name (for .gz files for example), and when to retry. 3) CompressionUtils does most of the heavy lifting when it can
2015-03-30 12:33:23 -07:00
Fangjin Yang e5653f0752 Merge pull request #1190 from vigiglobe/master
Fix NPE when partionNumber 0 does not exist.
2015-03-26 13:25:39 -07:00
Xavier Léauté 389ea4c32f Merge pull request #1245 from b-slim/fix_injector_plus_ut
Bug fix @DruidSecondaryModule plus unit test
2015-03-26 10:04:44 -07:00
Fangjin Yang a9c47de571 Merge pull request #1243 from metamx/fix-union-timeline-lookup
fixes TimeboundaryQuery and DataSourceMetadata queries returning wrong values for union queries
2015-03-26 10:02:56 -07:00
Slim Bouguerra 1e6be7796e bug fix @DruidSecondaryModule plus unit test 2015-03-26 10:44:52 -05:00
nishantmonu51 638bf9d4e9 return sorted List of TimeLineObjectHolder 2015-03-26 11:51:09 +05:30
msprunck 942c17a2aa Remove timeline chunk count assumptions.
* Replace with generic iterables
2015-03-24 22:40:49 +01:00
Prajwal Tuladhar 9983216871 use https maven repo URL to download dependencies 2015-03-20 14:09:07 -04:00
fjy b389cfe404 [maven-release-plugin] prepare for next development iteration 2015-03-19 12:38:17 -07:00
fjy 60e7d543cc [maven-release-plugin] prepare release druid-0.7.1-rc1 2015-03-19 12:38:13 -07:00
nishantmonu51 39e60b3405 fix race in groupByParallelQueryRunner
add UT and use a queue for better concurrency
2015-03-17 20:57:05 +05:30
Xavier Léauté 127b6fd857 Merge pull request #1172 from himanshug/segment_metadata_eager
force eager the processing of segment metadata query on the processing executor
2015-03-12 10:19:48 -07:00
Xavier Léauté 0a5a3fe2dc fix file missing from rebase 2015-03-11 17:30:11 -07:00
Xavier Léauté e01ed16030 serde tests + equals/hashCode fixes for extraction functions 2015-03-11 16:48:28 -07:00
Xavier Léauté d3f5bddc5c Add ability to apply extraction functions to the time dimension
- Moves DimExtractionFn under a more generic ExtractionFn interface to
  support extracting dimension values other than strings
- pushes down extractionFn to the storage adapter from query engine
- 'dimExtractionFn' parameter has been deprecated in favor of 'extractionFn'
- adds a TimeFormatExtractionFn, allowing to project the '__time' dimension
- JavascriptDimExtractionFn renamed to JavascriptExtractionFn, adding
  support for any dimension value types that map directly to Javascript
- update documentation for time column extraction and related changes
2015-03-11 16:45:42 -07:00
Himanshu Gupta 55ebf0cfdf force eager the processing of segment metadata query on the processing threadpool by using ChainedExecutionQueryRunner in SegmentMetadataQueryRunnerFactory.mergeRunners(..) 2015-03-11 12:58:58 -05:00
Xavier Léauté 217e674063 Handling aggregators and post aggregators with duplicate names
* add test for same-name groupBy hyperUniques post-agg
* add test for same-name post-agg in groupby with approx histogram
* Fixes https://github.com/druid-io/druid/issues/1045
* Throws an error if post aggs and aggs do not have unique names
* Add more groupBy tests for Having filters
2015-03-10 17:10:43 -07:00
Fangjin Yang 0b467624ec Merge pull request #694 from druid-io/arithmetic-op-strategies
normal division & configurable ordering for ArithmeticPostAggregator
2015-03-10 13:48:27 -07:00
Fangjin Yang 2abdce1dc0 Merge pull request #1180 from metamx/logging-groupBy-NPE
add null check early to catch root cause for groupBy NPE while running bySegment query
2015-03-09 09:16:33 -07:00
nishantmonu51 6e935cca0a add null check early to catch root cause 2015-03-09 21:10:28 +05:30
Xavier Léauté 0d47c0c36d normal division and configurable ordering for ArithmeticPostAggregator
Fixes #510
2015-03-04 12:44:24 -08:00
Fangjin Yang d685e2ab04 Merge pull request #1165 from friedhardware/fix-NPerror-select
Added null check for the pagingSpec on a Select Query.
2015-03-02 14:17:06 -08:00
Fangjin Yang e8605c63a9 Merge pull request #1150 from himanshug/broker-parallel-chunk-process
interval chunk query runner now processes individual chunk in a threadpool
2015-03-02 13:50:23 -08:00
Himanshu Gupta 29039fd541 interval chunk query runner now processes individual chunk in a thread pool and prints metrics query/time per chunk 2015-03-02 15:45:09 -06:00
Joshua Schumacher e6130e0fdc Added null check for the pagingSpec on a Select Query. 2015-03-02 12:41:59 -08:00
Fangjin Yang 005f4da2c0 Merge pull request #1143 from metamx/update-rhino-1.7rc5
Update Rhino to 1.7RC5
2015-02-25 12:50:23 -08:00
Xavier Léauté b167dcf82c [maven-release-plugin] prepare for next development iteration 2015-02-23 14:28:06 -08:00
Xavier Léauté e81ac2ba43 [maven-release-plugin] prepare release druid-0.7.0 2015-02-23 14:27:58 -08:00
James Estes 562de6c621 Update docs and examples for log4j2 usage.
- Put configs early in classpath in examples so log4j2.xml will get picked up properly
- Add an example log4j2.xml file.
- Update Logging doc.
2015-02-19 11:40:56 -07:00
Xavier Léauté c4d721fffd update Rhino to 1.7RC5 2015-02-19 09:48:18 -08:00
Xavier Léauté 78df7f6165 Move Druid release artifacts to Sonatype
- Switch to using Druid parent POM
- Add required fields for Sonatype
- Common plugin versions and settings have been moved to the parent pom
- Cleanup artifacts and POMs for consistent formatting
- Remove org.hyperic.sigar dependency and update docs to reflect necessary jars to add at runtime when sigar is needed
2015-02-13 14:26:31 -08:00
fjy d29740ed9f [maven-release-plugin] prepare for next development iteration 2015-02-12 16:16:00 -08:00
fjy 211fd15b7e [maven-release-plugin] prepare release druid-0.7.0-rc3 2015-02-12 16:15:56 -08:00
Fangjin Yang 90bc62eb5c Merge pull request #1108 from metamx/improve-groupby-perf
Improve groupby by removing conversion to case insensitive row
2015-02-12 11:45:20 -08:00
nishantmonu51 15cf432b74 remove conversion to case insensitive row
this is not required after death to casing in 0.7
2015-02-11 19:40:36 +05:30
Xavier Léauté c5e99bf6ec Merge pull request #1105 from metamx/fixEmptyExtractionFilter
Fix empty results on ExtractionFilter.
2015-02-10 14:25:58 -08:00
Charles Allen b9cb311a52 Fix empty results on ExtractionFilter.
* Now returns empty results rather than erroring out
* Added unit tests for multiples case
2015-02-10 14:04:38 -08:00
fjy 708759e1e0 Update http-client to 1.0.0 2015-02-10 13:36:47 -08:00
Xavier Léauté a7dcaffb53 fix `__time` column selector for incremental index
- also adds tests for selecting the time column
2015-02-06 12:06:05 -08:00
Fangjin Yang 42e902b6e3 Merge pull request #1090 from metamx/alphanum-attribution
update code attribution
2015-02-04 15:51:34 -08:00
Xavier Léauté 0fbc6071c9 update code attribution 2015-02-04 15:28:44 -08:00
Fangjin Yang 25cf15824b Merge pull request #1085 from gianm/dsmrv-fix
DataSourceMetadataResultValue fixes and JodaUtils adjustments.
2015-02-03 17:51:33 -08:00
Gian Merlino 085ad8d345 Fix DataSourceMetadataResultValue serde. 2015-02-03 17:39:42 -08:00
fjy 1f12c5b2f1 [maven-release-plugin] prepare for next development iteration 2015-02-03 12:06:49 -08:00
fjy e82d431be7 [maven-release-plugin] prepare release druid-0.7.0-rc2 2015-02-03 12:06:41 -08:00
Xavier Léauté 4eff269536 Merge pull request #1079 from druid-io/cleanup-deps
Remove non friendly dependencies from Druid
2015-02-03 11:56:41 -08:00
fjy 3e5d338c8e Remove non friendly dependencies from Druid 2015-02-03 11:36:08 -08:00
Fangjin Yang 71b4c5fa86 Merge pull request #1076 from metamx/remove-threadlocals
remove thread-locals in GenericIndexed in favor of wrapped objects
2015-02-02 20:02:33 -08:00
Xavier Léauté cb2e300eba remove thread-locals in GenericIndexed in favor of wrapped objects to reduce GC pressure 2015-02-02 15:59:30 -08:00
Eric Tschetter 42eba986ce Towards consistent null handling
This commit also includes
1) the addition of a context parameter on timeseries queries that allows it to ignore empty buckets instead of generating results for them
2) A cleanup of an unused method on an interface
2015-02-02 12:53:07 -08:00
Fangjin Yang 92e616de11 Merge pull request #1077 from metamx/remove-unused-imports
remove unused imports
2015-02-02 10:45:27 -08:00
nishantmonu51 ba932bb1f2 remove unused imports 2015-02-02 21:53:39 +05:30
fjy d05032b98a towards a community led druid 2015-01-31 20:57:36 -08:00
Xavier Léauté f24a89a22a fix NPE for topN over missing hyperUniques column 2015-01-27 16:12:41 -08:00
Charles Allen 226dd91a31 Add a hash map for storing groupBy partition index
* Improves groupBy performance by approx 15%
2015-01-26 08:42:02 -08:00
fjy 1f94de22c6 [maven-release-plugin] prepare for next development iteration 2015-01-20 14:23:55 -08:00
fjy 17476edc31 [maven-release-plugin] prepare release druid-0.7.0-rc1 2015-01-20 14:23:51 -08:00
Charles Allen 3d27747f7e Upgrade to log4j2
Default behavior is as before.
Added documentation for how to enable synchronous logging for select chatty classes:
* io.druid.client.ServerInventoryView
* io.druid.client.BatchServerInventoryView
* io.druid.curator.inventory.CuratorInventoryManager
* com.metamx.http.client.pool.ChannelResourceFactory
2015-01-20 12:35:18 -08:00
Fangjin Yang 91a79dbf95 Merge pull request #1031 from metamx/ingestmetadata-query
DataSourceMetadata query
2015-01-19 21:55:35 -08:00
Charles Allen 7bb038756c Account for very slow writer threads in IncrementalIndexTest 2015-01-17 13:02:59 -08:00
Fangjin Yang b4041c13e5 Merge pull request #1029 from metamx/fixChainedExecutionQueryRunnerTest
Address spurious test failures
2015-01-16 13:08:32 -08:00
Xavier Léauté 3b3aad78cb Merge pull request #1027 from metamx/concurrentOnHeapIncrementalIndexFix
Fix concurrency issues in OnheapIncrementalIndex
2015-01-16 12:54:42 -08:00
Charles Allen 197af967ef Fix concurrency issues in OnheapIncrementalIndex
* Was encountering weird errors when fast writes were coming in while queries were happening.
* Added unit tests which tend to cause concurrency query problems
2015-01-16 12:01:46 -08:00
Charles Allen ebafa2a786 Fix spurious test failures in ChainedExecutionQueryRunnerTest 2015-01-15 16:49:16 -08:00
Fangjin Yang 5bfcc43377 Merge pull request #1008 from metamx/stringConversionJavaUtilUpdate
Update all String conversions to and from byte[] to use the java-util StringUtils functions
2015-01-15 13:50:27 -08:00
nishantmonu51 c7452b75f6 Merge branch 'master' into ingestmetadata-query 2015-01-15 18:00:31 +05:30
Xavier Léauté d5f4182de4 global test timeouts + fix test race condition 2015-01-07 23:36:57 -08:00
Fangjin Yang 852e863425 Merge pull request #981 from druid-io/strictModuleTyping
Use Module instead of generic Object in Guice related items
2015-01-05 12:43:20 -08:00
Charles Allen b1b5c9099e Update all String conversions to and from byte[] to use the java-util StringUtils functions
* Speedup of GroupBy with javaScript filters by ~10%
* Requires https://github.com/metamx/java-util/pull/15
2015-01-05 11:22:32 -08:00
Xavier Léauté 3fc6cf918d add test for large chunks 2015-01-02 14:31:22 -08:00
Xavier Léauté f2f9cbeca8 throw error rather than returning garbage results 2015-01-02 14:29:21 -08:00
Xavier Léauté 071943a367 fix LZF compression with buffers exceeding LZF chunk size 2015-01-02 11:39:50 -08:00
Xavier Léauté f2439899e7 fix bitmap factory serde 2014-12-23 15:07:32 -08:00
Xavier Léauté 27a3169312 increase test timeouts 2014-12-19 17:09:43 -08:00
Charles Allen 971afab36f Lengthen CompressionStrategyTest::testKnownSizeConcurrency() to have 2m timeout on its test to account for shared Jenkins build lag 2014-12-19 12:53:20 -08:00
Charles Allen 7c8d4a7433 Use Module instead of generic Object in Guice related items 2014-12-19 10:54:06 -08:00
Fangjin Yang be507b8cb4 Merge pull request #943 from mrijke/partialdimextractfn-nullpointer
Fix NullPointerException in PartialDimExtractionFn
2014-12-16 12:29:27 -07:00
nishantmonu51 80e4b68ee7 review comments 2014-12-16 21:16:48 +05:30
Fangjin Yang b3fe91bb50 Merge pull request #830 from metamx/union-merge-on-historical
Union merge on historical
2014-12-15 13:36:47 -07:00
fjy 3cb7999eb9 i hate hadoop dependencies 2014-12-15 09:52:46 -08:00
nishantmonu51 a0d3579a92 add docs + fix tests 2014-12-11 17:58:01 +05:30
nishantmonu51 7ad03087c0 Merge branch 'master' into ingestmetadata-query 2014-12-11 16:54:38 +05:30
nishantmonu51 32b4f55b8a review comments refactoring 2014-12-11 16:33:14 +05:30
nishantmonu51 3763357f6e Ingest metadata query implementation 2014-12-10 19:44:00 +05:30
Fangjin Yang d6d3ec6846 Merge pull request #948 from metamx/ingestion-docs
Redocumenting ingestion
2014-12-09 15:30:03 -07:00
fjy 9596c11f42 address cr 2014-12-09 14:19:18 -08:00
nishantmonu51 1a1b0e6f23 merge from master and review comments 2014-12-09 13:16:45 +05:30
xvrl 1392e2731f Merge pull request #936 from metamx/cachingRunnerImprovements
General Caching Query Runners cleanup (40% query time reduction for HLL)
2014-12-08 14:07:52 -08:00
Charles Allen 7b65f0635d General Caching Query Runners cleanup
* Add type strictness to CachingClusteredClient.
* Add background caching to CachingClusteredClient. Gives between 0% and 5% query speed increase.
* Add @BackgroundCaching annotation for injected ExecutorService items
* Add `numBackgroundThreads' configuration options to CacheConfig (default 0 aka same thread legacy behavior)
* Add unit tests for CacheConfig
* Add an abstract caching query runner class, currently it doesn't do anything exceppt simply make the two caching queries distinct.
* Add caching to CachingQueryRunner. Gives up to a WHOPPING 40% reduction in query time on HLL queries
* Updated docs with more info on cache settings.
2014-12-08 13:29:32 -08:00
Maarten Rijke 90670a9c7e Fix NullPointerException in PartialDimExtractionFn by explicity checking for dimValue == null, attempt 2 2014-12-08 22:26:35 +01:00
Maarten Rijke bd9bbf396c Fix NullPointerException in PartialDimExtractionFn by explicity checking for dimValue == null 2014-12-08 20:11:58 +01:00
Xavier Léauté ad23e49777 use fixed-size mapdb cache to avoid heap growing uncontrollably 2014-12-05 15:34:50 -08:00
Xavier Léauté 7cd45a6e1f IncrementalIndex throws exception if limit exceeded
- For now uses a hardcoded ratio of aggregator to timeanddim buffer sizes
- canAppendRow is a workaround for realtime index since the
Firehose currently does not have a way of rolling back the last event in
case of error
- canAppendRow needs a fudge factor; there is a race between checking
if we can add a row and actually adding a row, because of the way MapDB
reports its size.
2014-12-04 14:38:16 -08:00
Xavier Léauté c7dbe6116c write byte data as is in smile 2014-12-04 10:57:56 -08:00
Xavier Léauté c21a82a697 upgrade LZ4 to operate directly on ByteBuffers 2014-12-04 10:57:56 -08:00
Xavier Léauté 0c521e0a77 update joda-time and fix min/max instant 2014-12-04 10:57:56 -08:00
nishantmonu51 269a51964e fix size calculation 2014-12-04 17:22:24 +05:30
nishantmonu51 4dc0fdba8a consider mapped size in limit calculation & review comments 2014-12-03 23:47:30 +05:30
Charles Allen 529e7e0272 Merge pull request #927 from metamx/speedup-smile-bytes
Improve Smile serde performance by writing binary data as is
2014-12-03 10:02:08 -08:00
Charles Allen 0f5d5840da Merge pull request #924 from metamx/update-joda-time
Update Joda-Time and fix min/max instant overflow
2014-12-03 09:15:39 -08:00
nishantmonu51 da8bd7836b Introduce buffer size 2014-12-03 16:28:22 +05:30
Xavier Léauté 5fece517fa write byte data as is in smile 2014-12-03 00:01:01 -08:00
Xavier Léauté 18f50097a9 upgrade LZ4 to operate directly on ByteBuffers 2014-12-02 23:53:56 -08:00
fjy bc173d14fc a whole bunch of cleanup and fixes 2014-12-02 17:32:05 -08:00
Xavier Léauté a79389a9e5 update joda-time and fix min/max instant 2014-12-02 17:27:22 -08:00
nishantmonu51 b65933ffb8 make tests parameterised 2014-12-02 23:55:29 +05:30
nishantmonu51 6dc69c2f30 code cleanups & formatting 2014-12-02 22:44:33 +05:30
nishantmonu51 eac776f1a7 tests passing with on heap incremental index 2014-12-02 22:29:28 +05:30
Xavier Léauté 4eee7e69b9 fix cardinality aggregator caching 2014-11-26 15:00:37 -08:00
xvrl 5bc1be5ba0 Merge pull request #850 from metamx/druid-0.7.x-compressionstrategy
Compression strategy changes
2014-11-25 12:58:39 -08:00
Charles Allen c6043afa32 Removed empty function from CompressionStrategyTest 2014-11-25 12:57:06 -08:00