Commit Graph

2018 Commits

Author SHA1 Message Date
Clint Wylie 66f64cd8bd fix long/float/double dimension filtering for columns with nulls (#6906)
* fix long,float, double dimension filtering when sql compatible null handling is enabled and the column has null values

* revert unintended change

* fix tests
2019-01-23 22:36:52 -08:00
Roman Leventov 8eae26fd4e Introduce SegmentId class (#6370)
* Introduce SegmentId class

* tmp

* Fix SelectQueryRunnerTest

* Fix indentation

* Fixes

* Remove Comparators.inverse() tests

* Refinements

* Fix tests

* Fix more tests

* Remove duplicate DataSegmentTest, fixes #6064

* SegmentDescriptor doc

* Fix SQLMetadataStorageUpdaterJobHandler

* Fix DataSegment deserialization for ignoring id

* Add comments

* More comments

* Address more comments

* Fix compilation

* Restore segment2 in SystemSchemaTest according to a comment

* Fix style

* fix testServerSegmentsTable

* Fix compilation

* Add comments about why SegmentId and SegmentIdWithShardSpec are separate classes

* Fix SystemSchemaTest

* Fix style

* Compare SegmentDescriptor with SegmentId in Javadoc and comments rather than with DataSegment

* Remove a link, see https://youtrack.jetbrains.com/issue/IDEA-205164

* Fix compilation
2019-01-21 11:11:10 -08:00
Jihoon Son cc06e7e2df Fix fallback to cursor-based plan in UseIndexesStrategy (#6875)
* Fix fallback to cursor-based plan in UseIndexesStrategy

* fix build

* add a comment
2019-01-18 10:41:01 +08:00
Jonathan Wei 68f744ec0a
Fixed buckets histogram aggregator (#6638)
* Fixed buckets histogram aggregator

* PR comments

* More PR comments

* Checkstyle

* TeamCity

* More TeamCity

* PR comment

* PR comment

* Fix doc formatting
2019-01-17 14:51:16 -08:00
Dayue Gao 5b8a221713 Add SQL id, request logs, and metrics (#6302)
* use SqlLifecyle to manage sql execution, add sqlId

* add sql request logger

* fix UT

* rename sqlId to sqlQueryId, sql/time to sqlQuery/time, etc

* add docs and more sql request logger impls

* add UT for http and jdbc

* fix forbidden use of com.google.common.base.Charsets

* fix UT in QuantileSqlAggregatorTest, supressed unused warning of getSqlQueryId

* do not use default method in QueryMetrics interface

* capitalize 'sql' everywhere in the non-property parts of the docs

* use RequestLogger interface to log sql query

* minor bugfixes and add switching request logger

* add filePattern configs for FileRequestLogger

* address review comments, adjust sql request log format

* fix inspection error

* try SuppressWarnings("RedundantThrows") to fix inspection error on ComposingRequestLoggerProvider
2019-01-15 23:12:59 -08:00
Charles Allen 5d2947cd52 Use Guava Compatible immediate executor service (#6815)
* Use multi-guava version friendly direct executor implementation

* Don't use a singleton

* Fix strict compliation complaints

* Copy Guava's DirectExecutor

* Fix javadoc

* Imports are the devil
2019-01-11 10:42:19 -08:00
Richard Startin 99097617a1 use RoaringBitmapWriter for RoaringBitmap construction (#6764) 2019-01-08 17:18:41 -08:00
dongyifeng def823124c add version comparator for StringComparator (#6745)
* add version comparator for StringComparator

* add more test case and docs
2019-01-08 17:17:03 -08:00
Clint Wylie 9505074530 fix log typo (#6755)
* fix log typo, add DataSegmentUtils.getIdentifiersString util method

* fix indecisive oops
2018-12-18 15:10:25 -08:00
Clint Wylie 486c6f3cf9 emit logs that are only useful for debugging at debug level (#6741)
* make logs that are only useful for debugging be at debug level so log volume is much more chill

* info level messages for total merge buffer allocated/free

* more chill compaction logs
2018-12-17 14:20:28 +08:00
Atul Mohan 86e3ae5b48 Add fail message (#6720) 2018-12-11 08:05:50 -08:00
Gian Merlino b7709e1245 FileUtils: Sync directory entry too on writeAtomically. (#6677)
* FileUtils: Sync directory entry too on writeAtomically.

See the fsync(2) man page for why this is important:
https://linux.die.net/man/2/fsync

This also plumbs CompressionUtils's "zip" function through
writeAtomically, so the code for handling atomic local filesystem
writes is all done in the same place.

* Remove unused import.

* Avoid FileOutputStream.

* Allow non-atomic writes to overwrite.

* Add some comments. And no need to flush an unbuffered stream.
2018-12-08 17:12:59 +01:00
Furkan KAMACI bbb283fa34 Double-checked locking bugs (#6662)
* Double-checked locking bug is fixed.

* @Nullable is removed since there is no need to use along with @MonotonicNonNull.

* Static import is removed.

* Lazy initialization is implemented.

* Local variables used instead of volatile ones.

* Local variables used instead of volatile ones.
2018-12-07 17:10:29 +01:00
Jihoon Son d525e5b18e Fix travis timeout in BufferHashGrouperTest (#6713)
* Fix travis timeout in BufferHashGrouperTest

* adjust buffer size

* adjust bufferSize and loadFactor

* increase memory

* add debug code

* cat error

* after script

* print logs

* print per 2 min

* use direct mem

* clean up
2018-12-07 12:05:27 +08:00
Atul Mohan ec36f0b82f Add default comparison to HavingSpecMetricComparator for custom Aggregator types (#6505)
* Add default comparison

* Switch to BigDecimal comparison

* Add comparator from AggFactory

* Fix indent

* Add tests
2018-12-04 13:35:13 -08:00
Clint Wylie a1c9d0add2 autosize processing buffers based on direct memory sizing by default (#6588)
* autosize processing buffers based on direct memory sizing

* remove oops, more test

* max 1gb autosize buffers, test, start of docs

* fix oops

* revert accidental change

* print buffer size in exception

* change the things
2018-12-03 18:40:02 -07:00
Roman Leventov ec38df7575
Simplify DruidNodeDiscoveryProvider; add DruidNodeDiscovery.Listener.nodeViewInitialized() (#6606)
* Simplify DruidNodeDiscoveryProvider; add DruidNodeDiscovery.Listener.nodeViewInitialized() method; prohibit and eliminate some suboptimal Java 8 patterns

* Fix style

* Fix HttpEmitterTest.timeoutEmptyQueue()

* Add DruidNodeDiscovery.Listener.nodeViewInitialized() calls in tests

* Clarify code
2018-12-01 01:12:56 +01:00
Mingming Qiu 849ba867b2 fix missing property in JsonTypeInfo of SegmentWriteOutMediumFactory (#6656) 2018-11-27 15:59:58 -08:00
Roman Leventov 887c645675
Find duplicate lines with checkstyle; enable some duplicate inspections in IntelliJ (#6558)
Not putting this to 0.13 milestone because the found bugs are not critical (one is a harmless DI config duplicate, and another is in a benchmark.

Change in `DumpSegment` is just an indentation change.
2018-11-26 16:55:42 +01:00
Roman Leventov 87b96fb1fd
Add checkstyle rules about imports and empty lines between members (#6543)
* Add checkstyle rules about imports and empty lines between members

* Add suppressions

* Update Eclipse import order

* Add empty line

* Fix StatsDEmitter
2018-11-20 12:42:15 +01:00
Gian Merlino fe69da0d95
Expressions: Fix improper supplier reuse with missing columns. (#6600)
* Expressions: Fix improper supplier reuse with missing columns.

ExpressionSelectors has an optimization that skips building a Map
when there is only one input supplier. However, this optimization
should not be used in the case where the is one input supplier but
more than one input identifier (which can happen when only one
input identifier corresponds to an actual column).

Fixes #6556.

* Add underscores to statics.
2018-11-15 22:13:32 -08:00
David Lim 7b41e23cbb
remove backpressure time from DefaultQueryMetrics pending on-going discussion (#6631) 2018-11-15 19:29:50 -07:00
Roman Leventov 8f3fe9cd02 Prohibit String.replace() and String.replaceAll(), fix and prohibit some toString()-related redundancies (#6607)
* Prohibit String.replace() and String.replaceAll(), fix and prohibit some toString()-related redundancies

* Fix bug

* Replace checkstyle regexp with IntelliJ inspection
2018-11-15 13:21:34 -08:00
Jihoon Son cdae2fe7b5 Deprecate IntervalChunkingQueryRunner (#6591)
* Deprecate IntervalChunkingQueryRunner

* add doc

* deprecate metric

* fix doc
2018-11-14 06:33:27 +08:00
Gian Merlino 52f6bdc1eb
Optimization for expressions that hit a single long column. (#6599)
* Optimization for expressions that hit a single long column.

There was previously a single-long-input optimization that applied only
to the time column. These have been combined together. Also adds
type-specific value caching to ExprEval, which allowed simplifying
the SingleLongInputCachingExpressionColumnValueSelector code.

* Add more benchmarks.

* Don't use LRU cache for __time.

* Simplify a bit.

* Let the cache grow.
2018-11-13 09:36:32 -08:00
Roman Leventov 54351a5c75 Fix various bugs; Enable more IntelliJ inspections and update error-prone (#6490)
* Fix various bugs; Enable more IntelliJ inspections and update error-prone

* Fix NPE

* Fix inspections

* Remove unused imports
2018-11-06 14:38:08 -08:00
Roman Leventov a2a1a1c2c9 Hide NullDimensionSelector from public (#6480) 2018-11-02 04:38:21 -07:00
QiuMM 676f5e6d7f Prohibit some guava collection APIs and use JDK collection APIs directly (#6511)
* Prohibit some guava collection APIs and use JDK APIs directly

* reset files that changed by accident

* sort codestyle/druid-forbidden-apis.txt alphabetically
2018-10-29 13:02:43 +01:00
Samarth Jain 0a90b3d51a Remove unused code (#6504)
* Remove unused code

* Remove usage of list in setDimensions and setAggregatorSpecs

* Fix formatting to adhere to 120 character guideline
2018-10-26 11:31:10 -07:00
Roman Leventov 84ac18dc1b
Catch some incorrect method parameter or call argument formatting patterns with checkstyle (#6461)
* Catch some incorrect method parameter or call argument formatting patterns with checkstyle

* Fix DiscoveryModule

* Inline parameters_and_arguments.txt

* Fix a bug in PolyBind

* Fix formatting
2018-10-23 07:17:38 -03:00
Samarth Jain 359576a80b Implement force push down for nested group by query (#5471)
* Force nested query push down

* Code review changes
2018-10-22 13:43:47 -07:00
Roman Leventov 789c9a1dc7 Prohibit using Object|Long|Float|DoubleColumnSelector in instanceof statements (#6470)
* Prohibit using Object|Long|Float|DoubleColumnSelector in instanceof statements

* Doc fixes
2018-10-15 15:41:43 -07:00
robertervin 95ab1ea737 Fix Empty InDimFilter Failure (#6330)
* fix empty InDimFilter failure (#6101)

* Add test case for empty values input

* Add documentation for empty values in InDimFilter
2018-10-14 20:43:16 -07:00
Clint Wylie 84598fba3b combine druid-api, druid-common, java-util into druid-core (#6443)
* combine druid-api, druid-common, java-util

* spacing
2018-10-14 20:37:37 -07:00
dongyifeng b06ac54a5e add PrefixFilteredDimensionSpec for multi-value dimensions (#6307)
* add PrefixFilteredDimensionSpec for multi-value dimensions

* add docs for PrefixFilteredDimensionSpec

* remove unnecessary null handling

* add null check to the result of NullHandling
2018-10-12 17:51:09 -07:00
David Lim 20ab213ba6 change project versions to 0.13.0-incubating-SNAPSHOT (#6453) 2018-10-11 19:28:01 -07:00
Charles Allen c55b37d7ec Add optional `name` to top level of FilteredAggregatorFactory (#6219)
* Add optional `name` to top level of FilteredAggregatorFactory

* Add compat constructor for tests

* Address comments

* Add equals and hash code updates

* Rename test

* Fix imports and code style
2018-10-11 11:56:53 -07:00
Clint Wylie f7775d1db3 fixes for LookupReferencesManagerTest (#6444)
* some fixes for LookupReferencesManagerTest

* docs

* formatting

* more formatting fixes
2018-10-10 18:02:11 -07:00
Roman Leventov 09126c021a
Remove Aggregator.clone() methods (#6437)
* Remove Aggregator.clone() methods

* Remove CardinalityAggregator.name
2018-10-10 10:07:56 -03:00
QiuMM 0b8085aff7 Prohibit jackson ObjectMapper#reader methods which are deprecated (#6386)
* Prohibit jackson ObjectMapper#reader methods which are deprecated

* address comments
2018-10-03 17:55:20 -03:00
Roman Leventov 3ae563263a
Renamed 'Generic Column' -> 'Numeric Column'; Fixed a few resource leaks in processing; misc refinements (#5957)
This PR accumulates many refactorings and small improvements that I did while preparing the next change set of https://github.com/druid-io/druid/projects/2. I finally decided to make them a separate PR to minimize the volume of the main PR.

Some of the changes:
 - Renamed confusing "Generic Column" term to "Numeric Column" (what it actually implies) in many class names.
 - Generified `ComplexMetricExtractor`
2018-10-02 14:50:22 -03:00
Jihoon Son cb14a43038 Remove ConvertSegmentTask, HadoopConverterTask, and ConvertSegmentBackwardsCompatibleTask (#6393)
* Remove ConvertSegmentTask, HadoopConverterTask, and ConvertSegmentBackwardsCompatibleTask

* update doc and remove auto conversion

* remove remaining doc

* fix teamcity
2018-10-01 12:03:35 -07:00
Shiv Toolsidass 5a894f830b Added backpressure metric (#6335)
* Added backpressure metric

* Updated channelReadable to AtomicBoolean and fixed broken test

* Moved backpressure metric logic to NettyHttpClient

* Fix placement of calculating backPressureDuration
2018-09-29 14:24:04 -07:00
Jihoon Son f09e718c68 Implement MapVirtualColumn.makeDimensionSelector properly (#6396)
* Implement MapVirtualColumn.makeDimensionSelector properly

* address comments
2018-09-29 14:13:05 -07:00
Jihoon Son faf3f1e426 Fix cache keys of DefaultDimensionSpec and ExtractionDimensionSpec (#6390) 2018-09-26 20:08:53 -07:00
Nishant Bangarwa c9d281a2e9 Add ability to pass in Bloom filter from Hive Queries (#6222)
* Bloom filter initial implementation

fix checkstyle

review comments

Fix wierd failure

review comments

Revert "Fix wierd failure"

This reverts commit a13a83ad7887e679f6d539191b52aeaaea85b613.

* fix test

* review comment
2018-09-26 16:04:26 -07:00
Jonathan Wei 00b0a156e9
Tweak isInvalidRows behavior in HadoopTuningConfig (#6339)
* Tweak isInvalidRows behavior in HadoopTuningConfig

* Fix tests
2018-09-24 16:13:13 -07:00
Alexander Saydakov 93345064b5 HllSketch module (#5712)
* HllSketch module

* updated license and imports

* updated package name

* implemented makeAggregateCombiner()

* removed json marks

* style fix

* added module

* removed unnecessary import, side effect of package renaming

* use TreadLocalRandom

* addressing code review points, mostly formatting and comments

* javadoc

* natural order with nulls

* typo

* factored out raw input value extraction

* singleton

* style fix

* style fix

* use Collections.singletonList instead of Arrays.asList

* suppress warning
2018-09-24 08:41:56 -07:00
Jonathan Wei 609da01882 Fix dictionary ID race condition in IncrementalIndexStorageAdapter (#6340)
Possibly related to https://github.com/apache/incubator-druid/issues/4937

--------

There is currently a race condition in IncrementalIndexStorageAdapter that can lead to exceptions like the following, when running queries with filters on String dimensions that hit realtime tasks: 

```
org.apache.druid.java.util.common.ISE: id[5] >= maxId[5]
	at org.apache.druid.segment.StringDimensionIndexer$1IndexerDimensionSelector.lookupName(StringDimensionIndexer.java:591)
	at org.apache.druid.segment.StringDimensionIndexer$1IndexerDimensionSelector$2.matches(StringDimensionIndexer.java:562)
	at org.apache.druid.segment.incremental.IncrementalIndexStorageAdapter$IncrementalIndexCursor.advance(IncrementalIndexStorageAdapter.java:284)
```

When the `filterMatcher` is created in the constructor of `IncrementalIndexStorageAdapter.IncrementalIndexCursor`, `StringDimensionIndexer.makeDimensionSelector` gets called eventually, which calls:

```
final int maxId = getCardinality();
...

 @Override
  public int getCardinality()
  {
    return dimLookup.size();
  }
```

So `maxId` is set to the size of the dictionary at the time that the `filterMatcher` is created.

However, the `maxRowIndex` which is meant to prevent the Cursor from returning rows that were added after the Cursor was created (see https://github.com/apache/incubator-druid/pull/4049) is set after the `filterMatcher` is created.

If rows with new dictionary values are added after the `filterMatcher` is created but before `maxRowIndex` is set, then it is possible for the Cursor to return rows that contain the new values, which will have `id >= maxId`.

This PR sets `maxRowIndex` before creating the `filterMatcher` to prevent rows with unknown dictionary IDs from being passed to the `filterMatcher`.

-----------

The included test triggers the error with a custom Filter + DruidPredicateFactory.

The DimensionSelector for predicate-based filter matching is created here in `Filters.makeValueMatcher`:

```
  public static ValueMatcher makeValueMatcher(
      final ColumnSelectorFactory columnSelectorFactory,
      final String columnName,
      final DruidPredicateFactory predicateFactory
  )
  {
    final ColumnCapabilities capabilities = columnSelectorFactory.getColumnCapabilities(columnName);

    // This should be folded into the ValueMatcherColumnSelectorStrategy once that can handle LONG typed columns.
    if (capabilities != null && capabilities.getType() == ValueType.LONG) {
      return getLongPredicateMatcher(
          columnSelectorFactory.makeColumnValueSelector(columnName),
          predicateFactory.makeLongPredicate()
      );
    }

    final ColumnSelectorPlus<ValueMatcherColumnSelectorStrategy> selector =
        DimensionHandlerUtils.createColumnSelectorPlus(
            ValueMatcherColumnSelectorStrategyFactory.instance(),
            DefaultDimensionSpec.of(columnName),
            columnSelectorFactory
        );

    return selector.getColumnSelectorStrategy().makeValueMatcher(selector.getSelector(), predicateFactory);
  }
```

The test Filter adds a row to the IncrementalIndex in the test when the predicateFactory creates a new String predicate, after `DimensionHandlerUtils.createColumnSelectorPlus` is called.
2018-09-18 10:43:29 +04:00
Roman Leventov 0c4bd2b57b Prohibit some Random usage patterns (#6226)
* Prohibit Random usage patterns

* Fix FlattenJSONBenchmarkUtil
2018-09-14 13:35:51 -07:00