Commit Graph

399 Commits

Author SHA1 Message Date
Roman Leventov f5d4171459 Prohibit for loops which could be foreach with IntelliJ (#4653)
* Replace for with foreach

* Replace for with for-each in GroupByQueryEngineV2

* Remove io.druid.collections.IntList
2017-08-08 18:05:33 -07:00
Roman Leventov aa7e4ae5e4 Enforce correct spacing with Checkstyle (#4651) 2017-08-05 10:18:25 -07:00
Jihoon Son f3f2cd35e1 Array-based aggregation for groupBy query (#4576)
* Array-based aggregation

* Fix handling missing grouping key

* Handle invalid offset

* Fix compilation

* Add cardinality check

* Fix cardinality check

* Address comments

* Address comments

* Address comments

* Address comments

* Cleanup GroupByQueryEngineV2.process

* Change to Byte.SIZE

* Add flatMap
2017-08-03 20:04:54 +03:00
Gian Merlino 5048ab3e96 Add metrics to the native queries underpinning SQL. (#4561)
* Add metrics to the native queries underpinning SQL.

This is done by factoring out the metrics and request log emitting
code from QueryResource into a new QueryLifecycle class. That class
is used by both QueryResource and the SQL DruidSchema and QueryMaker.

Also fixes a couple of bugs in QueryResource:

- RequestLogLine start time was set to `TimeUnit.NANOSECONDS.toMillis(startNs)`,
  which is incorrect since absolute nanos cannot be converted to millis.
- DruidMetrics.makeRequestMetrics was called with null `query` on
  unparseable queries, which led to spurious "Unable to log query"
  errors.

Partial fix for #4047.

* Code style

* Remove unused imports.

* Fix tests.

* Remove unused import.
2017-07-24 21:26:27 -07:00
Roman Leventov c0beb78ffd Enforce brace formatting with Checkstyle (#4564) 2017-07-21 10:26:59 -05:00
Gian Merlino 2be7068f6e Fixes and improvements to SQL metadata caching. (#4551)
* Fixes and improvements to SQL metadata caching.

Also adds support for MultipleSpecificSegmentSpec to CachingClusteredClient.

SQL changes:
- Cache metadata on a per-segment level, in addition to per-dataSource, so
  we don't need to re-query all segments whenever a single new one appears.
  This should lower the load placed on the cluster by metadata queries.
- Fix race condition in DruidSchema that can cause us to miss metadata. It was
  possible to notice new segments, then issue a query, and have that query
  not actually hit those segments, and not notice that it didn't hit those segments.
  Then, the metadata from those segments would be ignored.
- Fix assumption in DruidSchema that all segments are immutable. Now, mutable
  segments are periodically re-queried.
- Fix inappropriate re-use of SchemaPlus. Now we create one for each planning
  cycle, rather than sharing one. It caches table objects, which we want to
  avoid, since it can cause stale metadata. We do the caching in DruidSchema
  so we don't need the SchemaPlus caching.

Server changes:
- Add a TimelineCallback to TimelineServerView, for callers that want to get updates
  when the timeline has been modified.
- Change CachingClusteredClient from a QueryRunner to a QuerySegmentWalker. This
  allows it to accept queries that are segment-descriptor-based rather than
  intervals-based. In particular it will now support MultipleSpecificSegmentSpec.

* Fix DruidSchema, and unused imports.

* Remove unused import.

* Fix SqlBenchmark.
2017-07-20 10:14:15 -07:00
Slim 71e7a4c054 Adding double colums supports (#4491)
* add double columns support

* Fix numbers and expected results in UTs

* adding float aggregators

* fix IT expected test results

* fix comments

* more fixes

* fix comp

* fix test

* refactor double and float aggregator factories

* fix

* fix UTs

* fix comments

* clean unused code

* fix more comments

* undo unnecessary changes

* fix null issue

* refactor TopNColumnSelectorStrategyFactory

* fix docs

* refactor NumericTopNColumnSelectorStrategy

* fix return

* fix comments

* handle the null case in DimesionIndexer

* more null fixing

* cosmetic changes
2017-07-20 10:14:14 +03:00
Chris Gavin 960cb07ea6 Fix some unnecessary use of boxed types and incorrect format strings spotted by lgtm. (#4474)
* Remove some unnecessary use of boxed types.

* Fix some incorrect format strings.

* Enable IDEA's MalformedFormatString inspection.

* Add a Checkstyle check for finding uses of incorrect logging packages.

* Fix some incorrect usages of the metamx logger.

* Bypass incorrect logger Checkstyle check where using the correct logger is not simple.

* Fix some more places where the wrong number of arguments are provided to format strings.

* Suppress `MalformedFormatString` inspection on legacy logging test.

* Use @SuppressWarnings rather than a noinspection suppression comment.

* Fix some more incorrect format strings.

* Suppress some more incorrect format string warnings where the incorrect string is intentional.

* Log the aggregator when closing it fails.

* Remove some unneeded log lines.
2017-07-13 12:15:32 -07:00
Roman Leventov 9ae457f7ad Avoid using the default system Locale and printing to System.out in production code (#4409)
* Avoid usages of Default system Locale and printing to System.out or System.err in production code

* Fix Charset in DruidKerberosUtil

* Remove redundant string format in GenericIndexed

* Rename StringUtils.safeFormat() to unimportantSafeFormat(); add StringUtils.format() which fails as well as String.format()

* Fix testSafeFormat()

* More fixes of redundant StringUtils.format() inside ISE

* Rename unimportantSafeFormat() to nonStrictFormat()
2017-06-29 14:06:19 -07:00
Roman Leventov ae900a4934 Update versions to 0.11.0-SNAPSHOT (#4483) 2017-06-28 17:05:58 -07:00
Roman Leventov 2fa4b10145 More fine-grained DI for management node types. Don't allocate processing resources on Router (#4429)
* Remove DruidProcessingModule, QueryableModule and QueryRunnerFactoryModule from DI for coordinator, overlord, middle-manager. Add RouterDruidProcessing not to allocate processing resources on router

* Fix examples

* Fixes

* Revert Peon configs and add comments

* Remove qualifier
2017-06-27 22:58:01 -07:00
Roman Leventov 05d58689ad Remove the ability to create segments in v8 format (#4420)
* Remove ability to create segments in v8 format

* Fix IndexGeneratorJobTest

* Fix parameterized test name in IndexMergerTest

* Remove extra legacy merging stuff

* Remove legacy serializer builders

* Remove ConciseBitmapIndexMergerTest and RoaringBitmapIndexMergerTest
2017-06-26 13:21:39 -07:00
Gian Merlino 679cf277c0 Add ExpressionFilter. (#4405)
* Add ExpressionFilter.

The expression filter expects a single argument, "expression", and matches
rows where that expression is true.

* Code review comments.

* CR comment.

* Fix logic.

* Fix test.

* Remove unused import.
2017-06-20 12:42:26 -07:00
Goh Wei Xiang f68a0693f3 Allow use of non-threadsafe ObjectCachingColumnSelectorFactory (#4397)
* Adding a flag to indicate when ObjectCachingColumnSelectorFactory need not be threadsafe.

* - Use of computeIfAbsent over putIfAbsent
- Replace Maps.newXXXMap() with normal instantiation
- Documentations on when is thread-safe required.
- Use Builders for On/OffheapIncrementalIndex

* - Optimization on computeIfAbsent
- Constant EMPTY DimensionsSpec
- Improvement on IncrementalIndexSchema.Builder
  - Remove setting of default values
  - Use var args for metrics
- Correction on On/OffheapIncrementalIndex Builders
- Combine On/OffheapIncrementalIndex Builders

* - Removing unused imports.

* - Helper method for testing with IncrementalIndex.Builder

* - Correction on javadoc.

* Style fix
2017-06-16 16:04:19 -05:00
Gian Merlino 1f2afccdf8 Expressions: Add ExprMacros. (#4365)
* Expressions: Add ExprMacros, which have the same syntax as functions, but
can convert themselves to any kind of Expr at parse-time.

ExprMacroTable is an extension point for adding new ExprMacros. Anything
that might need to parse expressions needs an ExprMacroTable, which can
be injected through Guice.

* Address code review comments.
2017-06-08 09:32:10 -04:00
Roman Leventov 4ef3a770a3 Fix compilation of ExpressionBenchmark (#4379) 2017-06-07 14:18:09 -05:00
Roman Leventov b487fa355b More methods in QueryMetrics and TopNQueryMetrics (the last part of #3798) (#4284)
* Add more methods to QueryMetrics and TopNQueryMetrics, add BitmapResultFactory

* Add implementor expectations section to BitmapResultFactory javadoc
2017-06-07 09:49:08 -07:00
Roman Leventov 31d33b333e Make using implicit system Charset an error (#4326)
* Make using implicit system charset an error

* Use StringUtils.toUtf8() and fromUtf8() instead of String.getBytes() and new String()

* Use English locale in StringUtils.safeFormat()

* Restore comment
2017-06-05 23:57:25 -07:00
Gian Merlino fd55c894ce Add benchmark for expressions. (#4366)
* Add benchmark for expressions.

* Code review comments.
2017-06-05 21:29:08 -07:00
Roman Leventov 50e72c6aea Fix bugs (core) (#4339)
* Fix bugs

* Add test for GoogleDataSegmentPusher.buildPath()

* Exclude extension changes

* Address comments

* Brace
2017-06-02 06:47:59 +09:00
Jonathan Wei d49e53e6c2 Timeout and maxScatterGatherBytes handling for queries run by Druid SQL (#4305)
* Timeout and maxScatterGatherBytes handling for queries run by Druid SQL

* Address PR comments

* Fix contexts in CalciteQueryTest

* Fix contexts in QuantileSqlAggregatorTest
2017-05-23 16:57:51 +09:00
Benedict Jin e823085866 Improve `collection` related things that reusing a immutable object instead of creating a new object (#4135) 2017-05-17 01:38:51 +09:00
Roman Leventov 1ebfa22955 Update Error prone configuration; Fix bugs (#4252)
* Make Errorprone the default compiler

* Address comments

* Make Error Prone's ClassCanBeStatic rule a error

* Preconditions allow only %s pattern

* Fix DruidCoordinatorBalancerTester

* Try to give the compiler more memory

* Remove distribution module activation on jdk 1.8 because only jdk 1.8 is used now

* Don't show compiler warnings

* Try different travis script

* Fix travis.yml

* Make Error Prone optional again

* For error-prone compiler

* Increase compiler's maxmem

* Don't run Error Prone for benchmarks because of OOM

* Skip install step in Travis

* Remove MetricHolder.writeToChannel()

* In travis.yml, check compilation before tests, because it may fail faster
2017-05-12 15:55:17 +09:00
Roman Leventov e09e892477 Refactor QueryRunner to accept QueryPlus: Query + QueryMetrics (part of #3798) (#4184)
* Add QueryPlus. Add QueryRunner.run(QueryPlus, Map) method with default implementation, to replace QueryRunner.run(Query, Map).

* Fix GroupByMergingQueryRunnerV2

* Fix QueryResourceTest

* Expand the comment to Query.run(walker, context)

* Remove legacy version of BySegmentSkippingQueryRunner.doRun()

* Add LegacyApiQueryRunnerTest and be more specific about legacy API removal plans in Druid 0.11 in Javadocs
2017-05-10 12:25:00 -07:00
Roman Leventov 8277284d67 Add Checkstyle rule to force comments to classes and methods to be Javadoc comments (#4239) 2017-05-04 11:14:41 -07:00
Gian Merlino 2ca7b00346 Update versions to 0.10.1-SNAPSHOT. (#4191) 2017-04-20 18:12:28 -07:00
Roman Leventov 73d9b31664 GenericIndexed minor bug fixes, optimizations and refactoring (#3951)
* Minor bug fixes in GenericIndexed; Refactor and optimize GenericIndexed; Remove some unnecessary ByteBuffer duplications in some deserialization paths; Add ZeroCopyByteArrayOutputStream

* Fixes

* Move GenericIndexedWriter.writeLongValueToOutputStream() and writeIntValueToOutputStream() to SerializerUtils

* Move constructors

* Add GenericIndexedBenchmark

* Comments

* Typo

* Note in Javadoc that IntermediateLongSupplierSerializer, LongColumnSerializer and LongMetricColumnSerializer are thread-unsafe

* Use primitive collections in IntermediateLongSupplierSerializer instead of BiMap

* Optimize TableLongEncodingWriter

* Add checks to SerializerUtils methods

* Don't restrict byte order in SerializerUtils.writeLongToOutputStream() and writeIntToOutputStream()

* Update GenericIndexedBenchmark

* SerializerUtils.writeIntToOutputStream() and writeLongToOutputStream() separate for big-endian and native-endian

* Add GenericIndexedBenchmark.indexOf()

* More checks in methods in SerializerUtils

* Use helperBuffer.arrayOffset()

* Optimizations in SerializerUtils
2017-03-27 14:17:31 -05:00
Jonathan Wei 5fb1638534 Add default configuration for select query 'fromNext' parameter (#3986)
* Add default configuration for select query 'fromNext' parameter

* PR comments

* Fix PagingSpec config injection

* Injection fix for test
2017-03-01 17:05:35 -08:00
Jonathan Wei a08660a9ca Support ingestion of long/float dimensions (#3966)
* Support ingestion for long/float dimensions

* Allow non-arrays for key components in indexing type strategy interfaces

* Add numeric index merge test, fixes

* Docs for numeric dims at ingestion

* Remove unused import

* Adjust docs, add aggregate on numeric dims tests

* remove unused imports

* Throw exception for bitmap method on numerics

* Move typed selector creation to DimensionIndexer interface

* unused imports

* Fix

* Remove unused DimensionSpec from indexer methods, check for dims first in inc index storage adapter

* Remove spaces
2017-02-28 19:04:41 -08:00
praveev 5ccfdcc48b Fix testDeadlock timeout delay (#3979)
* No more singleton. Reduce iterations

* Granularities

* Fix the delay in the test

* Add license header

* Remove unused imports

* Lot more unused imports from all the rearranging

* CR feedback

* Move javadoc to constructor
2017-02-28 12:51:41 -06:00
praveev c3bf40108d One granularity (#3850)
* Refactor Segment Granularity

* Beginning of one granularity

* Copy the fix for custom periods in segment-grunalrity over here.

* Remove the custom serialization for now.

* Compilation cleanup

* Reformat code

* Fixing unit tests

* Unify to use a single iterable

* Backward compatibility for rolling upgrade

* Minor check style. Cosmetic changes.

* Rename length and millis to duration

* CR feedback

* Minor changes.
2017-02-25 01:02:29 -06:00
Jihoon Son 7200dce112 Atomic merge buffer acquisition for groupBys (#3939)
* Atomic merge buffer acquisition for groupBys

* documentation

* documentation

* address comments

* address comments

* fix test failure

* Addressed comments

- Add InsufficientResourcesException
- Renamed GroupByQueryBrokerResource to GroupByQueryResource

* addressed comments

* Add takeBatch() to BlockingPool
2017-02-22 14:49:37 -06:00
Akash Dwivedi 797488a677 Removing Integer.MAX column size limit. (#3743)
* Removing Integer.MAX column size limit.

* On demand creation of headerLong, use v2 instead of v3

* Avoid reusing the same object from a previous test.

* Avoid reusing the same object from a previous test part#2

* code formatting.

* GenericIndexed/Writer code review changes.

* GenericIndexed/writer code review requested changes.

* checkIndex() to static

* native endianess for genericIndexedV2, code  review requested changes.

* Formatting

* Hll fix.

* use native endianess during bag size calculation.

* Code review requested changes.

* IOPeon close() changes.

* use different tmp directory path for testing.

* Code review requested changes.
2017-02-16 20:09:43 -06:00
Gian Merlino 16ef513c7d SQL: Add context and contextual functions to planner. (#3919)
* SQL: Add context and contextual functions to planner.

Added support for context parameters specified as JDBC connection properties
or a JSON object for SQL-over-JSON-over-HTTP.

Also added features that depend on context functionality:

- Added CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP functions.
- Added support for time zones other than UTC via a "timeZone" context.
- Pass down query context to Druid queries too.

Also some bug fixes:

- Fix DATE handling, it was largely done incorrectly before.
- Fix CAST(__time TO DATE) which should do a floor-to-day.
- Fix non-equality comparisons to FLOOR(__time TO X).
- Fix maxQueryCount property.

* Pass down context to nested queries too.
2017-02-15 14:09:14 -08:00
DaimonPl a2875a4d91 pre-computed HLL support for hyperUnique aggregator (#3909) 2017-02-13 15:26:20 -08:00
Akash Dwivedi 8854ce018e File.deleteOnExit() (#3923)
* Less use of File.deleteOnExit()
 * removed deleteOnExit from most of the tests/benchmarks/iopeon
 * Made IOpeon closable

* Formatting.

* Revert DeterminePartitionsJobTest, remove cleanup method from IOPeon
2017-02-13 15:12:14 -08:00
Himanshu 9dfcf0763a disable javascript execution by default (#3818) 2017-02-13 15:11:18 -08:00
Jonathan Wei ca2b04f0fd Add long/float ColumnSelectorStrategy implementations (#3838)
* Add long/float ColumnSelectorStrategy implementations

* Address PR comments

* Add String strategy with internal dictionary to V2 groupby, remove dict from numeric wrapping selectors, more tests

* PR comments

* Use BaseSingleValueDimensionSelector for long/float wrapping

* remove unused import

* Address PR comments

* PR comments

* PR comments

* More PR comments

* Fix failing calcite histogram subquery tests

* ScanQuery test and comment about isInputRaw

* Add outputType to extractionDimensionSpec, tweak SQL tests

* Fix limit spec optimization for numerics

* Add cardinality sanity checks to TopN

* Fix import from merge

* Add tests for filtered dimension spec outputType

* Address PR comments

* Allow filtered dimspecs on numerics

* More comments
2017-02-08 20:39:29 -08:00
Gian Merlino 12317fd001 Bump version to 0.10.0-SNAPSHOT. (#3913) 2017-02-06 17:54:35 -08:00
Jihoon Son ddd8c9ef97 Add filter selectivity estimation for auto search strategy (#3848)
* Add filter selectivity estimation for auto search strategy

* Addressed comments

* Lazy bitmap materialization for bitmap sampling and java docs

* Addressed comments.

- Fix wrong non-overlap ratio computation and added unit tests.
- Change Iterable<Integer> to IntIterable
- Remove unnecessary Iterable<Integer>

* Addressed comments

- Split a long ternary operation into if-else blocks
- Add IntListUtils.fromTo()

* Fix test failure and add a test for RangeIntList

* fix code style

* Diabled selectivity estimation for multi-valued dimensions

* Address comment
2017-02-06 11:15:03 -08:00
Gian Merlino ac84a3e011 SQL: Add resolution parameter, fix filtering bug with APPROX_QUANTILE (#3868)
* SQL: Add resolution parameter to quantile agg, rename to APPROX_QUANTILE.

* Fix bug with re-use of filtered approximate histogram aggregators.

Also add APPROX_QUANTILE tests for filtering and running on complex columns.
Includes some slight refactoring to allow tests to make DruidTables that
include complex columns.

* Remove unused import
2017-01-25 18:39:26 -08:00
Gian Merlino d51f5e058d SQL: Ditch CalciteConnection layer and add DruidMeta, extension aggregators. (#3852)
* SQL: Ditch CalciteConnection layer and add DruidMeta, extension aggregators.

Switched from CalciteConnection to Planner, bringing benefits:

- CalciteConnection's JDBC interface no longer sits between the SQL server
  (HTTP/Avatica) and Druid's query layer. Instead, the SQL servers can use
  Druid Sequence objects directly, reducing overhead in the query return path.

- Implemented our own Planner-based Avatica Meta, letting us control
  connection timeouts and connection / statement limits. The previous
  CalciteConnection-based implementation didn't have any limits or timeouts.

- The Planner interface lets us override the operator table, opening up
  SQL language extensions. This patch includes two: APPROX_COUNT_DISTINCT
  in core, and a QUANTILE aggregator in the druid-histogram extension.

Also:

- Added INFORMATION_SCHEMA metadata schema.

- Added tests for Unicode literals and escapes.

* Verify statement is actually open before closing it.

* More detailed INFORMATION_SCHEMA docs.
2017-01-19 16:32:20 -08:00
Akash Dwivedi dd0c4e2ead Migrating extendedset from Metamarkets. (#3694)
* Migrating extendedset from Metamarkets.

* Notice change

* More details in NOTICE

* NOTICE formatting.

* suppress header checkstlye for extendedset.
2017-01-17 10:10:27 -08:00
Gian Merlino e86859b228 SQL support for nested groupBys. (#3806)
* SQL support for nested groupBys.

Allows, for example, doing exact count distinct by writing:

  SELECT COUNT(*) FROM (SELECT DISTINCT col FROM druid.foo)

Contrast with approximate count distinct, which is:

  SELECT COUNT(DISTINCT col) FROM druid.foo

* Add deeply-nested groupBy docs, tests, and maxQueryCount config.

* Extract magic constants into statics.

* Rework rules to put preconditions in the "matches" method.
2017-01-11 18:32:53 -08:00
Jihoon Son d80bec83cc Enable auto license checking (#3836)
* Enable license checking

* Clean duplicated license headers
2017-01-10 18:13:47 -08:00
Jihoon Son c099977a5b Add an option to SearchQuery to choose a search query execution strategy (#3792)
* Add an option to SearchQuery to choose a search query execution strategy.

Supported strategies are
1) Index-only query execution
2) Cursor-based scan
3) Auto: choose an efficient strategy for a given query

* Add SearchStrategy and SearchQueryExecutor

* Address comments

* Rename strategies and set UseIndexesStrategy as the default strategy

* Add a cost-based planner for auto strategy

* Add document

* Fix code style

* apply code style

* apply comments
2017-01-10 18:04:20 -08:00
Roman Leventov 33800122ad Don't return leaked Objects back to StupidPool, because this is dangerous. Reuse Cleaners in StupidPool. Make StupidPools named. Add StupidPool.leakedObjectCount(). Minor fixes (#3631) 2016-12-26 00:35:35 -06:00
Gian Merlino dd63f54325 Built-in SQL. (#3682) 2016-12-16 17:15:59 -08:00
Jihoon Son 5e39578eee Enable parallel test (#3774)
* Enable parallel test

* Remove unnecessary NotThreadSafe annocation

* Randomize the start port when finding available ports

* Fix test failure

* Change to handle all negatives
2016-12-14 21:05:56 -08:00
Roman Leventov d9807835f9 Add BitmapIterationBenchmark and update to JMH 1.17.2 (#3751) 2016-12-07 15:50:33 -08:00
Gian Merlino b1bac9f2d3 groupBy v2: Ignore timestamp completely when granularity = all, except for the final merge. (#3740)
* GroupByBenchmark: Add serde, spilling, all-gran benchmarks.

Also use more iterations.

* groupBy v2: Ignore timestamp completely when granularity = all, except for the final merge.

Specifically:

- Remove timestamp from RowBasedKey when not needed
- Set timestamp to null in MapBasedRows that are not part of the final merge.
2016-12-06 16:17:32 -08:00
Navis Ryu c74d267f50 Support virtual column for select query (#2511)
* Support virtual column for select query

* Addressed comments
2016-12-05 15:14:35 -08:00
Akash Dwivedi 3e408497b3 Migrating bytebuffercollections from Metamarkets. (#3647)
* Migrating  bytebuffercollections from Metamarkets.

* resolving code conflicts and removing <p> from bytebuffer-collections.
2016-11-11 10:51:07 -08:00
Gian Merlino 2c504b6258 Add "like" filter. (#3642)
* Add "like" filter.

* Addressed some PR comments.

* Slight simplifications to LikeFilter.

* Additional simplifications.

* Fix comment in LikeFilter.

* Clarify comment in LikeFilter.

* Simplify LikeMatcher a bit.

* No use going through the optimized path if prefix is empty.

* Add more tests.
2016-11-04 23:25:03 +05:30
Himanshu 32c5494e97 eagerly allocate the intermediate computation buffers (#3628) 2016-10-31 15:24:07 -07:00
Akash Dwivedi 4b3bd8bd63 Migrating java-util from Metamarkets. (#3585)
* Migrating java-util from Metamarkets.

* checkstyle and updated license on java-util files.

* Removed unused imports from whole project.

* cherry pick metamx/java-util@826021f.

* Copyright changes on java-util pom, address review comments.
2016-10-21 14:57:07 -07:00
Roman Leventov 5dc95389f7 Add Checkstyle framework (#3551)
* Add Checkstyle framework

* Avoid star import

* Need braces for control flow statements

* Redundant imports

* Add NewLineAtEndOfFile check
2016-10-13 13:37:47 -07:00
Gian Merlino 40f2fe7893 Bump versions to 0.9.3-SNAPSHOT (#3524) 2016-09-29 13:53:32 -07:00
Jonathan Wei df766b2bbd Add dimension handling interface for ingestion and segment creation (#3217)
* Add dimension handling interface for ingestion and segment creation

* update javadocs for DimensionHandler/DimensionIndexer

* Move IndexIO row validation into DimensionHandler

* Fix null column skipping in mergerV9

* Add deprecation note for 'numeric_dims' filename pattern in IndexIO v8->v9 conversion

* Fix java7 test failure
2016-09-12 12:54:02 -07:00
Gian Merlino 6827c09311 GroupByBenchmark: Fix queryable index generation, improve memory use. (#3431)
With the old code, all on-disk segments were the same. Now they're different.
This will end up altering benchmark results for queryMultiQueryableIndex,
likely making them slower (since values won't group as well as they used to).

The memory changes will help test with larger/more segments, since we won't
have to hold them all in memory at once.
2016-09-06 14:37:55 -07:00
Dave Li c4e8440c22 Adds long compression methods (#3148)
* add read

* update deprecated guava calls

* add write and vsizeserde

* add benchmark

* separate encoding and compression

* add header and reformat

* update doc

* address PR comment

* fix buffer order

* generate benchmark files

* separate encoding strategy and format

* fix benchmark

* modify supplier write to channel

* add float NONE handling

* address PR comment

* address PR comment 2
2016-08-30 16:17:46 -07:00
Gian Merlino 1eb7a7e882 Restore optimizations in BoundFilter. (#3343) 2016-08-10 08:53:17 -07:00
kaijianding 50d52a24fc ability to not rollup at index time, make pre aggregation an option (#3020)
* ability to not rollup at index time, make pre aggregation an option

* rename getRowIndexForRollup to getPriorIndex

* fix doc misspelling

* test query using no-rollup indexes

* fix benchmark fail due to jmh bug
2016-08-02 11:13:05 -07:00
Dave Li bc20658239 groupBy nested query using v2 strategy (#3269)
* changed v2 nested query strategy

* add test for #3239

* update for new ValueMatcher interface and add benchmarks

* enable time filtering

* address PR comments

* add failing test for outer filter aggregator

* add helper class for sharing code

* update nested groupby doc

* move temporary storage instantiation

* address PR comment

* address PR comment 2
2016-08-01 18:30:39 -07:00
Jonathan Wei a6105cbb86 Add numeric StringComparator (#3270)
* Add numeric StringComparator

* Only use direct long comparison for numeric ordering in BoundFilter, add time filtering benchmark query

* Address PR comments, add multithreaded BoundDimFilter test

* Add comment on strlen tie handling

* Add timeseries interval filter benchmark

* Adjust docs

* Use jackson for StringComparator, address PR comments

* Add new TopNMetricSpec and SearchSortSpec with tests (WIP)

* More TopNMetricSpec and SearchSortSpec tests

* Fix NewSearchSortSpec serde

* Update docs for new DimensionTopNMetricSpec

* Delete NumericDimensionTopNMetricSpec

* Delete old SearchSortSpec

* Rename NewSearchSortSpec to SearchSortSpec

* Add TopN numeric comparator benchmark, address PR comments

* Refactor OrderByColumnSpec

* Add null checks to NumericComparator and String->BigDecimal conversion function

* Add more OrderByColumnSpec serde tests
2016-07-29 15:44:16 -07:00
Jonathan Wei a42ccb6d19 Support filtering on long columns (including __time) (#3180)
* Support filtering on __time column

* Rename DruidPredicate

* Add docs for ValueMatcherFactory, add comment on getColumnCapabilities

* Combine ValueMatcherFactory predicate methods to accept DruidCompositePredicate

* Address PR comments (support filter on all long columns)

* Use predicate factory instead of composite predicate

* Address PR comments

* Lazily initialize long handling in selector/in filter

* Move long value parsing from InFilter to InDimFilter, make long value parsing thread-safe

* Add multithreaded selector/in filter test

* Fix non-final lock object in SelectorDimFilter
2016-07-20 17:08:49 -07:00
Gian Merlino c622a25236 BenchmarkDataGenerator: Don't generate timestamps at the end instant of the interval. (#3242)
Because timestamps at the end instant are not actually part of the interval. This
affected benchmark numbers, since it meant some data points would not be queried
(the interval for the query was based on getDataInterval) and also the
TimestampCheckingOffsets could not use the allWithinThreshold optimization.
2016-07-14 10:20:10 +05:30
Gian Merlino ea03906fcf Configurable compressRunOnSerialization for Roaring bitmaps. (#3228)
Defaults to true, which is a change in behavior (this used to be false and unconfigurable).
2016-07-08 10:24:19 +05:30
Hyukjin Kwon 45f553fc28 Replace the deprecated usage of NoneShardSpec (#3166) 2016-06-25 10:27:25 -07:00
Gian Merlino 4cc39b2ee7 Alternative groupBy strategy. (#2998)
This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.

Both of these are described in more detail in #2987.

There are two goals of this patch:

1. Make it possible for historical/realtime nodes to return larger groupBy
   result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
   columns, avoiding materialization.

This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
2016-06-24 18:06:09 -07:00
Jonathan Wei 24860a1391 Two-stage filtering (#3018)
* Two-stage filtering

* PR comment
2016-06-22 16:08:21 -07:00
Gian Merlino ebf890fe79 Update master version to 0.9.2-SNAPSHOT. (#3133) 2016-06-13 13:10:38 -07:00
Jonathan Wei b72c54c4f8 Add benchmark data generator, basic ingestion/persist/merge/query benchmarks (#2875) 2016-05-25 16:39:37 -07:00
Charles Allen 15ccf451f9 Move QueryGranularity static fields to QueryGranularities (#2980)
* Move QueryGranularity static fields to QueryGranularityUtil
* Fixes #2979

* Add test showing #2979

* change name to QueryGranularities
2016-05-17 16:23:48 -07:00
Xavier Léauté e79284da59 new interval based cost function (#2972)
* new interval based cost function

Addresses issues with balancing of segments in the existing cost function
- `gapPenalty` led to clusters of segments ~30 days apart
- `recencyPenalty` caused imbalance among recent segments
- size-based cost could be skewed by compression

New cost function is purely based on segment intervals:
- assumes each time-slice of a partition is a constant cost
- cost is additive, i.e. cost(A, B union C) = cost(A, B) + cost(A, C)
- cost decays exponentially based on distance between time-slices

* comments and formatting

* add more comments to explain the calculation
2016-05-17 09:56:00 -07:00
John Wang 5658bd99eb added contextual time parse (#2867) 2016-04-25 13:35:10 -07:00
Gian Merlino 7c0b1dde3a DimensionPredicateFilter: Skip unnecessary dictionary lookup. (#2850) 2016-04-18 12:38:25 -07:00
jon-wei 0e481d6f93 Allow filters to use extraction functions 2016-04-05 13:24:56 -07:00
Gian Merlino 2970b49adc BoundFilter optimizations, and related interface changes.
BoundFilter:

- For lexicographic bounds, use bitmapIndex.getIndex to find the start and end points,
  then union all bitmaps between those points.
- For alphanumeric bounds, iterate through dimValues, and union all bitmaps for values
  matching the predicate.
- Change behavior for nulls: it used to be that the BoundFilter would never match nulls,
  now it matches nulls if "" is allowed by the lower limit and not excluded by the
  upper limit.

Interface changes:

- BitmapIndex: add `int getIndex(value)` to make it possible to get the index for a
  value without retrieving the bitmap.
- BitmapIndex: remove `ImmutableBitmap getBitmap(value)`, change callers to `getBitmap(getIndex(value))`.
- BitmapIndexSelector: allow retrieving the underlying BitmapIndex through getBitmapIndex.
- Clarified contract of indexOf in Indexed, GenericIndexed.

Also added tests for SelectorFilter, NotFilter, and BoundFilter.
2016-03-25 14:11:48 -07:00
Gian Merlino 1869c350d8 Move FlattenJSONBenchmarkUtilTest into the correct location. 2016-03-24 15:47:54 -07:00
Gian Merlino 738dcd8cd9 Update version to 0.9.1-SNAPSHOT.
Fixes #2462
2016-03-17 10:34:20 -07:00
Himanshu Gupta 02dfd5cd80 update IncrementalIndex to support unsorted facts map that can be used in groupBy merging to improve performance 2016-03-10 16:11:48 -06:00
jon-wei c17ce02467 Allow IncrementalIndex to store Long/Float dimensions 2016-02-24 13:51:57 -08:00
jon-wei c53bf85d83 Add docs and benchmark for JSON flattening parser 2015-12-09 16:13:30 -08:00
Himanshu Gupta 62ba9ade37 unifying license header in all java files 2015-12-05 22:16:23 -06:00
Xavier Léauté fa6142e217 cleanup and remove unused imports 2015-11-11 12:25:21 -08:00
Xavier Léauté e4ac78e43d bump next snapshot to 0.9.0 2015-10-20 13:46:13 -07:00
Xavier Léauté 4c2c7a2c37 update version to 0.8.3 2015-10-14 21:40:55 -07:00
Xavier Léauté 3b2e41e42a update for next release 2015-08-18 17:16:46 -07:00
Charles Allen 7fe8562980 Remove locks from StupidPool 2015-08-05 19:24:56 -07:00
nishant efb6e0649e simplify merging of results on broker.
add benchmark

formatting and review comments

organize imports
2015-07-27 11:46:11 -07:00
Xavier Léauté 4cfb00bc8a inrement version 2015-07-15 13:09:05 -07:00
Nishant 6fa49b771c Merge pull request #1453 from metamx/concicebenchmarkComplement
Update bytebuffer-collections and add basic ConciseComplementBenchmark for testing concise complements
2015-06-23 08:41:42 +05:30
Charles Allen 88b2ef5d6b Update bytebuffer-collections 2015-06-22 18:07:24 -07:00
Charles Allen b72a70522b Add basic ConciseComplementBenchmark for testing concise complements 2015-06-19 16:47:41 -07:00
Xavier Léauté 0a5bb909a2 [maven-release-plugin] prepare for next development iteration 2015-06-18 17:35:19 -07:00
Xavier Léauté 59c6b2b279 [maven-release-plugin] prepare release druid-0.8.0-rc1 2015-06-18 17:35:14 -07:00
nishant 37873eb4f0 more benchmarks + filter
fix test

Modify test to access random filtered rows

formatting

fix test

remove main method

Add newline

Add newline
2015-06-04 22:12:29 +05:30
nishant 81415282aa Enabling compression for multiValued dimension
Add test and refactoring

Add benchmark tests
2015-05-27 00:09:14 +05:30