Commit Graph

1974 Commits

Author SHA1 Message Date
Jihoon Son faf3f1e426 Fix cache keys of DefaultDimensionSpec and ExtractionDimensionSpec (#6390) 2018-09-26 20:08:53 -07:00
Nishant Bangarwa c9d281a2e9 Add ability to pass in Bloom filter from Hive Queries (#6222)
* Bloom filter initial implementation

fix checkstyle

review comments

Fix wierd failure

review comments

Revert "Fix wierd failure"

This reverts commit a13a83ad7887e679f6d539191b52aeaaea85b613.

* fix test

* review comment
2018-09-26 16:04:26 -07:00
Jonathan Wei 00b0a156e9
Tweak isInvalidRows behavior in HadoopTuningConfig (#6339)
* Tweak isInvalidRows behavior in HadoopTuningConfig

* Fix tests
2018-09-24 16:13:13 -07:00
Alexander Saydakov 93345064b5 HllSketch module (#5712)
* HllSketch module

* updated license and imports

* updated package name

* implemented makeAggregateCombiner()

* removed json marks

* style fix

* added module

* removed unnecessary import, side effect of package renaming

* use TreadLocalRandom

* addressing code review points, mostly formatting and comments

* javadoc

* natural order with nulls

* typo

* factored out raw input value extraction

* singleton

* style fix

* style fix

* use Collections.singletonList instead of Arrays.asList

* suppress warning
2018-09-24 08:41:56 -07:00
Jonathan Wei 609da01882 Fix dictionary ID race condition in IncrementalIndexStorageAdapter (#6340)
Possibly related to https://github.com/apache/incubator-druid/issues/4937

--------

There is currently a race condition in IncrementalIndexStorageAdapter that can lead to exceptions like the following, when running queries with filters on String dimensions that hit realtime tasks: 

```
org.apache.druid.java.util.common.ISE: id[5] >= maxId[5]
	at org.apache.druid.segment.StringDimensionIndexer$1IndexerDimensionSelector.lookupName(StringDimensionIndexer.java:591)
	at org.apache.druid.segment.StringDimensionIndexer$1IndexerDimensionSelector$2.matches(StringDimensionIndexer.java:562)
	at org.apache.druid.segment.incremental.IncrementalIndexStorageAdapter$IncrementalIndexCursor.advance(IncrementalIndexStorageAdapter.java:284)
```

When the `filterMatcher` is created in the constructor of `IncrementalIndexStorageAdapter.IncrementalIndexCursor`, `StringDimensionIndexer.makeDimensionSelector` gets called eventually, which calls:

```
final int maxId = getCardinality();
...

 @Override
  public int getCardinality()
  {
    return dimLookup.size();
  }
```

So `maxId` is set to the size of the dictionary at the time that the `filterMatcher` is created.

However, the `maxRowIndex` which is meant to prevent the Cursor from returning rows that were added after the Cursor was created (see https://github.com/apache/incubator-druid/pull/4049) is set after the `filterMatcher` is created.

If rows with new dictionary values are added after the `filterMatcher` is created but before `maxRowIndex` is set, then it is possible for the Cursor to return rows that contain the new values, which will have `id >= maxId`.

This PR sets `maxRowIndex` before creating the `filterMatcher` to prevent rows with unknown dictionary IDs from being passed to the `filterMatcher`.

-----------

The included test triggers the error with a custom Filter + DruidPredicateFactory.

The DimensionSelector for predicate-based filter matching is created here in `Filters.makeValueMatcher`:

```
  public static ValueMatcher makeValueMatcher(
      final ColumnSelectorFactory columnSelectorFactory,
      final String columnName,
      final DruidPredicateFactory predicateFactory
  )
  {
    final ColumnCapabilities capabilities = columnSelectorFactory.getColumnCapabilities(columnName);

    // This should be folded into the ValueMatcherColumnSelectorStrategy once that can handle LONG typed columns.
    if (capabilities != null && capabilities.getType() == ValueType.LONG) {
      return getLongPredicateMatcher(
          columnSelectorFactory.makeColumnValueSelector(columnName),
          predicateFactory.makeLongPredicate()
      );
    }

    final ColumnSelectorPlus<ValueMatcherColumnSelectorStrategy> selector =
        DimensionHandlerUtils.createColumnSelectorPlus(
            ValueMatcherColumnSelectorStrategyFactory.instance(),
            DefaultDimensionSpec.of(columnName),
            columnSelectorFactory
        );

    return selector.getColumnSelectorStrategy().makeValueMatcher(selector.getSelector(), predicateFactory);
  }
```

The test Filter adds a row to the IncrementalIndex in the test when the predicateFactory creates a new String predicate, after `DimensionHandlerUtils.createColumnSelectorPlus` is called.
2018-09-18 10:43:29 +04:00
Roman Leventov 0c4bd2b57b Prohibit some Random usage patterns (#6226)
* Prohibit Random usage patterns

* Fix FlattenJSONBenchmarkUtil
2018-09-14 13:35:51 -07:00
Roman Leventov d50b69e6d4 Prohibit LinkedList (#6112)
* Prohibit LinkedList

* Fix tests

* Fix

* Remove unused import
2018-09-13 18:07:06 -07:00
Gian Merlino d6cbdf86c2
Broker backpressure. (#6313)
* Broker backpressure.

Adds a new property "druid.broker.http.maxQueuedBytes" and a new context
parameter "maxQueuedBytes". Both represent a maximum number of bytes queued
per query before exerting backpressure on the channel to the data server.

Fixes #4933.

* Fix query context doc.
2018-09-10 09:33:29 -07:00
Himanshu d61f708ef5 make COMPLEX column optionally filterable in Druid code (#6223)
* make COMPLEX column filterable in Druid code

* Revert "make COMPLEX column filterable in Druid code"

This reverts commit 9fc6ec768c.

* complex columns can be optionally made filterable

* some types are always filterable

* add ColumnCapabilitiesImpl serde tests

* add SuppresedWarnings annotation
2018-09-05 12:28:49 -07:00
Gian Merlino be6c901114 Like filter: Fix escapes escaping themselves. (#6295)
Escapes should escape themselves.
2018-09-05 09:29:07 -07:00
Gian Merlino 431d3d8497
Rename io.druid to org.apache.druid. (#6266)
* Rename io.druid to org.apache.druid.

* Fix META-INF files and remove some benchmark results.

* MonitorsConfig update for metrics package migration.

* Reorder some dimensions in inner queries for some reason.

* Fix protobuf tests.
2018-08-30 09:56:26 -07:00
Himanshu 1fae6513e1 add "subtotalsSpec" attribute to groupBy query (#5280)
* add subtotalsSpec attribute to groupBy query

* dont sent subtotalsSpec to downstream nodes from broker and other updates

* address review comment

* fix checkstyle issues after merge to master

* add docs for subtotalsSpec feature

* address doc review comments
2018-08-28 17:46:38 -07:00
Dayue Gao fcf8c8d53c RowBasedKeySerde should use empty dictionary in constructor (#6256) 2018-08-28 17:22:18 -07:00
Gian Merlino 4a8b09b6a9 Fix NPE on constant null numeric expressions. (#6232)
The bug was caused by makeExprEvalSelector returning a null object, which
it isn't supposed to do. Fixed this by renaming ConstantColumnValueSelector
to ConstantExprEvalSelector (it was only used for ExprEval anyway) and
putting logic in that class to make sure the selectors behave as expected.
2018-08-27 15:30:56 -07:00
Gian Merlino 71c1a70ff6 FilteredBufferAggregator: Fix missing relocate, isNull methods. (#6233) 2018-08-27 15:30:45 -07:00
Gian Merlino 157e75a1fe Minor followup to #6220. (#6231)
Adjustments to comments and usage of generics.
2018-08-27 12:01:44 -05:00
Gian Merlino cb40b6d369 Fix all inspection errors currently reported. (#6236)
* Fix all inspection errors currently reported.

TeamCity builds on master are reporting inspection errors, possibly
because there was a while where it was not running due to the Apache
migration, and there was some drift.

* Fix one more location.

* Fix tests.

* Another fix.
2018-08-26 18:36:01 -06:00
Gian Merlino 23ba6f7ad7 Fix four bugs with numeric dimension output types. (#6220)
* Fix four bugs with numeric dimension output types.

This patch includes the following bug fixes:

- TopNColumnSelectorStrategyFactory: Cast dimension values to the output type
  during dimExtractionScanAndAggregate instead of updateDimExtractionResults.
  This fixes a bug where, for example, grouping on doubles-cast-to-longs would
  fail to merge two doubles that should have been combined into the same long value.
- TopNQueryEngine: Use DimExtractionTopNAlgorithm when treating string columns
  as numeric dimensions. This fixes a similar bug: grouping on string-cast-to-long
  would fail to merge two strings that should have been combined.
- GroupByQuery: Cast numeric types to the expected output type before comparing them
  in compareDimsForLimitPushDown. This fixes #6123.
- GroupByQueryQueryToolChest: Convert Jackson-deserialized dimension values into
  the proper output type. This fixes an inconsistency between results that came
  from cache vs. not-cache: for example, Jackson sometimes deserializes integers
  as Integers and sometimes as Longs.

And the following code-cleanup changes, related to the fixes above:

- DimensionHandlerUtils: Introduce convertObjectToType, compareObjectsAsType,
  and converterFromTypeToType to make it easier to handle casting operations.
- TopN in general: Rename various "dimName" variables to "dimValue" where they
  actually represent dimension values. The old names were confusing.

* Remove unused imports.
2018-08-25 14:31:46 -07:00
Himanshu a76bf9ab2a add ability to do optional rollup in AggregationTestHelper (#6213) 2018-08-22 16:38:36 -07:00
Benedict Jin 3647d4c94a Make time-related variables more readable (#6158)
* Make time-related variables more readable

* Patch some improvements from the code reviewer

* Remove unnecessary boxing of Long type variables
2018-08-21 15:29:40 -07:00
Kirill Kozlov 62e580050c Use JUnit TemporaryFolder rule instead of system temp folder (#6070)
* Use JUnit TemporaryFolder rule instead of system tmp folder

* Allow to forbid apis which present not in all mvn modules
2018-08-16 11:05:45 -07:00
Jihoon Son ecee3e0a24 Further optimize memory for Travis jobs (#6150)
* Further optimize memory for Travis jobs

* fix build

* sudo false
2018-08-10 22:03:36 -07:00
Gian Merlino 3525d4059e
Cache: Add maxEntrySize config, make groupBy cacheable by default. (#5108)
* Cache: Add maxEntrySize config.

The idea is this makes it more feasible to cache query types that
can potentially generate large result sets, like groupBy and select,
without fear of writing too much to the cache per query.

Includes a refactor of cache population code in CachingQueryRunner and
CachingClusteredClient, such that they now use the same CachePopulator
interface with two implementations: one for foreground and one for
background.

The main reason for splitting the foreground / background impls is
that the foreground impl can have a more effective implementation of
maxEntrySize. It can stop retaining subvalues for the cache early.

* Add CachePopulatorStats.

* Fix whitespace.

* Fix docs.

* Fix various tests.

* Add tests.

* Fix tests.

* Better tests

* Remove conflict markers.

* Fix licenses.
2018-08-07 10:23:15 -07:00
Clint Wylie 62677212cc Order rows during incremental index persist when rollup is disabled. (#6107)
* order using IncrementalIndexRowComparator at persist time when rollup is disabled, allowing increased effectiveness of dimension compression, resolves #6066

* fix stuff from review
2018-08-06 14:17:48 -07:00
Nishant Bangarwa 75c8a87ce1 Part 2 of changes for SQL Compatible Null Handling (#5958)
* Part 2 of changes for SQL Compatible Null Handling

* Review comments - break lines longer than 120 characters

* review comments

* review comments

* fix license

* fix test failure

* fix CalciteQueryTest failure

* Null Handling - Review comments

* review comments

* review comments

* fix checkstyle

* fix checkstyle

* remove unrelated change

* fix test failure

* fix failing test

* fix travis failures

* Make StringLast and StringFirst aggregators nullable and fix travis failures
2018-08-02 08:20:25 -07:00
Jonathan Wei b9c445c780
Optimize filtered aggs with interval filters in per-segment queries (#5857)
* Optimize per-segment queries

* Always optimize, add unit test

* PR comments

* Only run IntervalDimFilter optimization on __time column

* PR comments

* Checkstyle fix

* Add test for non __time column
2018-08-01 14:39:38 -07:00
Andrés Gómez e270362767 Add stringLast and stringFirst aggregators extension (#5789)
* Add lastString and firstString aggregators extension

* Remove duplicated class

* Move first-last-string doc page to extensions-contrib

* Fix ObjectStrategy compare method

* Fix doc bad aggregatos type name

* Create FoldingAggregatorFactory classes to fix SegmentMetadataQuery

* Add getMaxStringBytes() method to support JSON serialization

* Fix null pointer exception at segment creation phase when the string value is null

* Control the valueSelector object class on BufferAggregators

* Perform all improvements

* Add java doc on SerializablePairLongStringSerde

* Refactor ObjectStraty compare method

* Remove unused ;

* Add aggregateCombiner unit tests. Rename BufferAggregators unit tests

* Remove unused imports

* Add license header

* Add class name to java doc class serde

* Throw exception if value is unsupported class type

* Move first-last-string extension into druid core

* Update druid core docs

* Fix null pointer exception when pair->string is null

* Add null control unit tests

* Remove unused imports

* Add first/last string folding aggregator on AggregatorsModule to support segment metadata query

* Change SerializablePairLongString to extend SerializablePair

* Change vars from public to private

* Convert vars to primitive type

* Clarify compare comment

* Change IllegalStateException to ISE

* Remove TODO comments

* Control possible null pointer exception

* Add @Nullable annotation

* Remove empty line

* Remove unused parameter type

* Improve AggregatorCombiner javadocs

* Add filterNullValues option at StringLast and StringFirst aggregators

* Add filterNullValues option at agg documentation

* Fix checkstyle

* Update header license

* Fix StringFirstAggregatorFactory.VALUE_COMPARATOR

* Fix StringFirstAggregatorCombiner

* Fix if condition at StringFirstAggregateCombiner

* Remove filterNullValues from string first/last aggregators

* Add isReset flag in FirstAggregatorCombiner

* Change Arrays.asList to Collections.singletonList
2018-08-01 10:52:54 -07:00
Roman Leventov 0754d78a2e Prohibit Lists.newArrayList() with a single argument (#6068)
* Prohibit Lists.newArrayList() with a single argument

* Test fixes

* Add Javadoc to Node constructor
2018-07-31 20:09:10 -07:00
Clint Wylie 20ae8aa626 Fix 'auto' encoded longs + compression serializer (#6045)
* Fix 'auto' encoded longs + compression serializer
Fixes #6044

changes:
* Fixes `VSizeLongSerde` serializers to treat 'close' as 'flush' when used with `BlockLayoutColumnarLongsSerializer`, allowing unwritten values to be flushed to the buffer when the block is compressed
* Add exhaustive unit test that flexes a variety of value sizes, row counts, and compression strategies to catch issues such as these
:

* refactor LongSerializer close to be named flush instead

* revert and just make new serializers per block
2018-07-30 18:35:20 -07:00
Roman Leventov f3595c93d9 Fix a bug in GroupByQueryEngine (#6062) 2018-07-30 14:39:38 -07:00
Gian Merlino c57f4a5db0
FinalizingFieldAccessPostAggregator: Fix serde. (#6067)
Fixes #6063.
2018-07-28 08:44:22 -07:00
Benedict Jin 331a0afb98 Remove redundant type parameters and enforce some other style and inspection rules (#5980)
* Various changes about druid-services module

* Patch improvements from reviewer

* Add ToArrayCallWithZeroLengthArrayArgument & ArraysAsListWithZeroOrOneArgument into inspection profile

* Fix ArraysAsListWithZeroOrOneArgument

* Fix conflict

* Fix ToArrayCallWithZeroLengthArrayArgument

* Fix AliEqualsAvoidNull

* Remove blank line

* Remove unused import clauses

* Fix code style in TopNQueryRunnerTest

* Fix conflict

* Don't use Collections.singletonList when converting the type of array type

* Add argLine into maven-surefire-plugin in druid-process module & increase the timeout value for testMoveSegment testcase

* Roll back the latest commit

* Add java.io.File#toURL() into druid-forbidden-apis

* Using Boolean.parseBoolean instead of Boolean.valueOf for CliCoordinator#isOverlord

* Add a new regexp element into stylecode xml file

* Fix style error for new regexp

* Set the level of ArraysAsListWithZeroOrOneArgument as WARNING

* Fix style error for new regexp

* Add option BY_LEVEL for ToArrayCallWithZeroLengthArrayArgument in inspection profile

* Roll back the level as ToArrayCallWithZeroLengthArrayArgument as ERROR

* Add toArray(new Object[0]) regexp into checkstyle config file & fix them

* Set the level of ArraysAsListWithZeroOrOneArgument as ERROR & Roll back the level of ToArrayCallWithZeroLengthArrayArgument as WARNING until Youtrack fix it

* Add a comment for string equals regexp in checkstyle config

* Fix code format

* Add RedundantTypeArguments as ERROR level inspection

* Fix cannot resolve symbol datasource
2018-07-27 16:56:49 -05:00
kaijianding 7919e4d5df move rangeSet compare into shardspec (#5688) 2018-07-26 14:17:57 -07:00
Gian Merlino 04ea3c9f8c
Update license headers. (#5976)
* Update license headers.

For compliance with http://www.apache.org/legal/src-headers.html.

* More license adjustments.

* Fix mistakenly edited package line.
2018-07-11 09:55:18 -07:00
Gian Merlino 948e73da77 Extend various test timeouts. (#5978)
False failures on Travis due to spurious timeout (in turn due to noisy
neighbors) is a bigger problem than legitimate failures taking too long
to time out. So it makes sense to extend timeouts.
2018-07-10 13:02:14 -07:00
Benedict Jin b3021ec802 Fix bug in SegmentAnalyzer.analyzeComplexColumn() #5939 (#5954) 2018-07-09 15:36:16 -07:00
Surekha 441c9819d9 Support limit for timeseries query (#5894) (#5931)
* Support limit for timeseries query (#5894)

* Fix tests

* Address PR comments

* Try to fix teamcity inspection checks

* Remove unused method from VirtualColumns

* Remove unused import statement
2018-07-09 08:58:42 -07:00
Jihoon Son 10a01d6846 [SQL] Fix missing postAggregations for Timeseries and TopN (#5912)
* [SQL] Fix missing postAggregations for Timeseries and TopN

* fix build

* fix test
2018-06-29 10:36:55 -07:00
Jonathan Wei f3e1520360
Fix merge for TrueDimFilter (#5916)
* Fix merge for TrueDimFilter

* remove unused cache ID
2018-06-28 14:46:47 -07:00
scrawfor bf2a31a5bc Add new 'true' filter which always returns true. (#5711)
* Add new 'true' filter which always returns true.

* Add support for bitmap index.

* Adds documentation.

* Removes No-op Filter
2018-06-28 11:52:45 -07:00
zhangxinyu d857345b7d add method getRequiredColumns for DimFilter (#5872)
* add method getRequiredColumns for DimFilter

* deal with the NullPointerException when DimFilter is null
2018-06-27 15:45:46 -07:00
陈春斌 7649742943 Use ReentrantReadWriteLock in DimensionDictionary (#5883) 2018-06-25 12:35:26 -07:00
Nishant Bangarwa 1c031784cb Align long Aggregator implementation with Double and Float (#5861)
Add LongMin/Max aggregator combiners
Extract common code from LongSum/Min/MaxAggregatorFactories in
SimpleLongAggregatorFactory
2018-06-14 01:56:41 +04:00
Gian Merlino 3af95913a9 Lazy-ify IncrementalIndex filtering too. (#5852)
* Lazy-ify IncrementalIndex filtering too.

Follow-up to #5403, which only lazy-ified cursor-based filtering
on QueryableIndex.

* Fix logic error.
2018-06-06 18:03:34 -07:00
Gian Merlino 78fd27cdb2
Lazy-ify ValueMatcher BitSet optimization for string dimensions. (#5403)
* Lazy-ify ValueMatcher BitSet optimization for string dimensions.

The idea is that if the prior evaluated filters are decently selective,
such that they mean we won't see all possible values of the later
filters, then the eager version of the optimization is too wasteful.

This involves checking an extra bitset, but the overhead is small even
if the lazy-ification is useless.

* Remove import.

* Minor transformation
2018-06-05 09:06:51 -07:00
Clint Wylie 2b45a6a42d Fix topN lexicographic sort (#5815)
* fixes #5814
changes:
* pass `StorageAdapter` to topn algorithms to get things like if column is 'sorted' or if query interval is smaller than segment granularity, instead of using `io.druid.segment.Capabilities`
* remove `io.druid.segment.Capabilities` since it had one purpose, supplying `dimensionValuesSorted` which is now provided directly by `StorageAdapter`.
* added test for topn optimization path checking

* add Capabilities back since StorageAdapter is marked PublicApi

* oops

* add javadoc, fix build i think

* correctly revert api changes

* fix intellij fail

* fix typo :(
2018-05-31 09:53:29 -07:00
Jihoon Son 9dca5ec76b Simple cleanup for ThreadPoolTaskRunner and SetAndVerifyContextQueryRunner / Add ThreadPoolTaskRunnerTest (#5557)
* Simple fix for ThreadPoolTaskRunner

* fix build

* address comments

* update javadoc

* fix build

* fix test

* add dependency
2018-05-15 22:53:11 +05:30
Alexander Saydakov 15864434be ArrayOfDoublesSketch module (#5148)
* ArrayOfDoublesSketch module

* UTF-8 fix

* javadoc, style fixes

* more style fixes

* null key selector fix

* more style fixes

* removed @Override, strict compiler doesn't like it

* removed @Override, strict compiler doesn't like it

* IndexedInts is not autoclosable? removed one more @0verride

* synchronized with upstream master

* removed unused imports

* addressed review points

* null fix

* addressed review points

* IAE from druid package

* synchronized aggregate() and get()

* use locks per buffer position

* corrected javadoc

* style fixes

* added lock and narrowed the scope

* addressed review comments

* conflict resolution went wrong

* addressed review comments

* javadoc

* javadoc links

* fully qualified name since there is no import for this class

* addressed review points

* style fix

* StandardCharsets.UTF_8

* addressed review points

* added @Override

* added equals and hashCode tests for post aggs

* formatting

* suppress warnings

* optimal IndexedInts iteration

* suppress SelfEquals

* added comments about getClass() in equals()
2018-05-13 15:48:00 +03:00
Dylan Wylie e8caf02147 Revert "Use a bimap for reverse lookups on injective maps" (#5764)
* Revert "Consider waiting and pending compaction tasks as well as running tasks in DruidCoordinatorSegmentCompactor (#5704)"

This reverts commit c7a59394e0.

* Revert "Fix metrics for inserting segments (#5749)"

This reverts commit c9d645103b.

* Revert "Typo fix in historical doc (#5753)"

This reverts commit aa23fe6386.

* Revert "Use a bimap for reverse lookups on injective maps (#5681)"

This reverts commit e1277d306c.
2018-05-09 19:12:36 -07:00
Surekha 2f8904e25f Check against the real default of maxBytes(1/6 max mem) in AppenderatorImpl's add (#5758)
* The check for maxBytesInMemory should be >= 0 instead of > 0

* if the default value is 0, the actual check could be skipped
* fix the message for persistReasons

* Address PR comments

* if maxBytes set -1, make is Long.MAX_VAL, so we do not need to check if it's 0 or -1
* set the maxBytesTuningconfig in AppenderatorImpl constructor to avoid duplicate code

* fix the failing test cases

* Address PR comments
2018-05-09 13:41:51 -07:00