Commit Graph

1887 Commits

Author SHA1 Message Date
Roman Leventov e64ffb10c2 Standartize on using Integer.BYTES instead of Ints.BYTES from Guava, same for other primitives (#5366) 2018-02-07 13:24:30 -08:00
Gian Merlino 971d45ab3f Use a separate snapshot file per lookup tier. (#5358)
Prevents conflicts if two processes on the same machine use the
same lookup snapshot directory but are in different tiers.
2018-02-07 11:28:53 -08:00
Gian Merlino e255d66b85
Fix two improper casts in HavingSpecMetricComparator. (#5352)
* Fix two improper casts in HavingSpecMetricComparator.

Fixes two things:

1. An improper double-to-long cast when comparing double metrics to any
   kind of value, which was a regression from #4883.
2. An improper double-to-long cast when comparing a long/int metric to a
   double/float value: the value was cast to long/int, drawing strange
   conclusions like int 100 matching a havingSpec of equalTo(100.5).

* Add comments.

* Remove extraneous comment.

* Simplify code a bit.
2018-02-06 13:18:55 -08:00
Gian Merlino c21ff6e81c
Properly set "identity" in query metrics. (#5330)
* Properly set "identity" in query metrics.

This patch adds an "identity" field to QueryPlus and sets it in
QueryLifecycle when the query starts executing. This is important
because it allows it to be used for future QueryMetrics created
by that QueryPlus object.

We also add "identity" to the request-level QueryMetrics object
created in emitLogsAndMetrics.

* Remove unused method.
2018-02-06 10:53:00 -08:00
Gian Merlino 8c738c7076 Fix races in LookupSnapshotTaker, CoordinatorPollingBasicAuthenticatorCacheManager (#5344)
* Fix races in LookupSnapshotTaker, CoordinatorPollingBasicAuthenticatorCacheManager.

Both were susceptible to the following conditions:

1. Two JVMs on the same machine (perhaps two peons) could conflict by one reading while the
   other was writing, or by writing to the file at the same time.
2. One JVM could partially write a file, then crash, leaving a truncated file.

* Use StringUtils.format
2018-02-06 09:44:06 -08:00
Slim 37c09ce3f8 Use both Joad Ids and Java IDs as Timezone to string readers (#5349)
* Use both Joad Ids and Java IDs as Timezone to string readers

Change-Id: Ieb5c18559879f3f3a0104912ce2f0a354ad0aac3

* move the function to DateTimes and add org.joda.time.DateTimeZone#forID as part of forbidden api

Change-Id: Iff97fa044758019ed0c231587d10e31a9cc18da0

* exclude class and remove other usage

Change-Id: Ib458c2caaa1865535767e1009fbf017a92c8f615

* remove it from test classes

Change-Id: I9b576324f6c7e17a74bd8b13879232c9a8cd40b4

* remove unused

Change-Id: If1c5b70c26c2b7c83c20434cb72b2060653f5052
2018-02-06 16:34:11 +05:30
Gian Merlino 9a62b02cb7 Extensions: Option to load classes from extension jars first. (#5321)
The behavior is configurable through druid.extensions.useExtensionClassloaderFirst.
It is useful when extensions want to load a dependency different from one provided
by Druid, for example a different version of geoip or protobuf.
2018-02-06 16:14:03 +05:30
Jonathan Wei 285dedd126 More ParseException handling for numeric dimensions (#5312)
* Discard rows with unparseable numeric dimensions

* PR comments

* Don't throw away entire row on parse exception

* PR comments

* Fix import
2018-02-05 21:43:35 -08:00
Gian Merlino 7e02408510 Update versions to 0.13.0-SNAPSHOT. (#5323) 2018-02-02 12:06:38 -06:00
Himanshu 4cd47de62f add LookupExtractorFactory.destroy() method (#5287)
* add LookupExtractorFactory.destroy() method

* fix LookupReferencesManagerTest
2018-02-01 22:56:09 -08:00
Gian Merlino ed47a1e1a9
Lookups: Inherit "injective" from registered lookups, improve docs. (#5316)
Code changes:
- In the lookup-based extractionFns, inherit injective property from
  the lookup itself if not specified.

Doc changes:
- Add a "Query execution" section to the lookups doc explaining how
  injective lookups and their optimizations work.
- Remove scary warnings against using registeredLookup extractionFns.
  They are necessary and important since they work with filters and
  function cascades -- two things that the dimension specs do not do.
  They deserve to be first class citizens.
- Move the "registeredLookup" fn above the "lookup" fn. It's probably
  more commonly used, so the docs read better this way.
2018-02-01 18:30:19 -08:00
Jonathan Wei 80419752b5 Add metamx emitter, http clients, and metrics packages to druid java-util (#5289)
* Add metamx java-util emitter, http clients, and metrics packages to druid java-util

* Remove metamx java-util from pom.xml files

* Checkstyle fixes

* Import fix

* TeamCity inspection fixes

* Use slf4j, move some version defs to master pom.xml

* Use parent jvm-attach-api and maven-surefire-plugin versions

* Add ] to log msg, suppress inspection
2018-01-24 22:10:36 +01:00
Roman Leventov 61e6878afd Check Javadoc reference integrity (#5279) 2018-01-22 13:51:28 -08:00
Roman Leventov a346bbc6f3 Enforce spacing around foreach colon with Checkstyle (#5271) 2018-01-22 11:48:51 -08:00
Roman Leventov f99c27e9e0 Fix bugs in ImmutableRTree; Merge bytebuffer-collections module into druid-processing (#5275)
* Fix bugs in ImmutableRTree; optimize ImmmutableRTreeObjectStrategy.writeTo(); Merge bytebuffer-collections module into druid-processing

* Remove unused declaration

* Fix another bug
2018-01-23 00:49:59 +05:30
Roman Leventov 87c744ac1d Add MethodParamPad, OneStatementPerLine and EmptyStatement Checkstyle checks (#5272) 2018-01-18 11:29:23 -08:00
Roman Leventov ad6cdf5d09
Reuse IndexedInts returned from DimensionSelector.getRow() implementations (#5172)
* Reuse IndexedInts in DimensionSelector implementations

* Remove BaseObjectColumnValueSelector.getObject() doc

* typo
2018-01-17 16:01:26 +01:00
Clint Wylie 491f8cca81 fix timewarp query results when using timezones and crossing DST transitions (#5157)
* timewarp and timezones
changes:
* `TimewarpOperator` will now compensate for daylight savings time shifts between date translation ranges for queries using a `PeriodGranularity` with a timezone defined
* introduces a new abstract query type `TimeBucketedQuery` for all queries which have a `Granularity` (100% not attached to this name). `GroupByQuery`, `SearchQuery`, `SelectQuery`, `TimeseriesQuery`, and `TopNQuery` all extend `TimeBucke
tedQuery`, cutting down on some duplicate code and providing a mechanism for `TimewarpOperator` (and anything else) that needs to be aware of granularity

* move precondition check to TimeBucketedQuery, add Granularities.nullToAll, add getTimezone to TimeBucketQuery

* formatting

* more formatting

* unused import

* changes:
* add 'getGranularity' and 'getTimezone' to 'Query' interface
* merge 'TimeBucketedQuery' into 'BaseQuery'
* fixup tests from resulting serialization changes

* dedupe

* fix after merge

* suppress warning
2018-01-11 12:39:33 -08:00
Roman Leventov 8877ce38d6
Enforce modifier order with Checkstyle (#5246) 2018-01-11 09:50:42 +01:00
Roman Leventov 535ec437e9 Apply 'power of 2' optimization to BlockLayoutIndexedDoubleSupplier (#5176)
* Apply 'power of 2' optimization to BlockLayoutIndexedDoubleSupplier; slight optimization of buffer.get() in block layout indexed suppliers

* Fix byte order
2018-01-05 16:08:07 +09:00
Jonathan Wei 935ac646f4
Upgrade to Calcite 1.15.0 (#5210)
* Upgrade to Calcite 1.15.0

* Use Filtration.eternity()
2018-01-04 12:11:24 -08:00
Roman Leventov 579f9fbedf Add IndexedInts.debugToString() and AbstractIndex.toString(); Add Sequence.toList() and limit() (#5175)
* Add IndexedInts.debugToString() and AbstractIndex.toString()

* Fix AppenderatorTest
2018-01-04 09:56:47 +09:00
Roman Leventov dc87e4fda1 Renamed IndexedFloats/Doubles/Longs to ColumnarFloats/..., IndexedMultivalue to ColumnarMultiInts, separate IndexedInts from ColumnarInts, many other renames for consistency in io.druid.segment.data package (#5171) 2017-12-20 18:50:07 -08:00
Clint Wylie 1181411901 small optimization in timeseries if 'skipEmptyBuckets' is true and cursor completed (#5178) 2017-12-19 16:47:00 -06:00
Roman Leventov f18eba50ee Remove Aggregator.reset() (#5177) 2017-12-19 14:09:17 -08:00
Roman Leventov 5787d04fad Bump Druid version to 0.12.0 (#5138) 2017-12-15 07:37:01 -08:00
Roman Leventov 64848c7ebf DataSegment memory optimizations (#5094)
* Deduplicate DataSegments contents (loadSpec's keys, dimensions and metrics lists as a whole) more aggressively; use ArrayMap instead of default LinkedHashMap for DataSegment.loadSpec, because they have only 3 entries on average; prune DataSegment.loadSpec on brokers

* Fix DataSegmentTest

* Refinements

* Try to fix

* Fix the second DataSegmentTest

* Nullability

* Fix tests

* Fix tests, unify to use TestHelper.getJsonMapper()

* Revert TestUtil as ServerTestHelper, fix tests

* Add newline

* Fix indexing tests

* Fix s3 tests

* Try to fix tests, remove lazy caching of ObjectMapper in TestHelper, rename TestHelper.getJsonMapper() to makeJsonMapper()

* Fix HDFS tests

* Fix HdfsDataSegmentPusherTest

* Capitalize constant names
2017-12-12 11:41:40 -08:00
Charles Allen 4365390310 Remove duplicate fastutil dependency in processing pom.xml (#5142) 2017-12-07 08:54:48 +09:00
Alexander Saydakov 45f91a241e numeric quantiles sketch aggregator (#5002)
* numeric quantiles sketch aggregator

* it seems that we need to synchronize all methods, which modify the state

* Seems like a false positive with -Pstrict

* code style fix

* code style fix

* use sketches-core-0.10.3

* moved cache ids to the central place

* better class names

* support large columns

* explained autodetection, added exception

* added comments regarding sketches moving on heap

* support reindexing

* implemented suggestions from jihoonson

* style fix

* use max(k, other.k) for better accuracy

* check for NilColumnValueSelector instead of null

* throw exceptions instead of providing no-op comparators
2017-12-06 08:18:08 +09:00
Roman Leventov a7a6a0487e Replace IOPeon with SegmentWriteOutMedium; Improve buffer compression (#4762)
* Replace IOPeon with OutputMedium; Improve compression

* Fix test

* Cleanup CompressionStrategy

* Javadocs

* Add OutputBytesTest

* Address comments

* Random access in OutputBytes and GenericIndexedWriter

* Fix bugs

* Fixes

* Test OutputBytes.readFully()

* Address comments

* Rename OutputMedium to SegmentWriteOutMedium and OutputBytes to WriteOutBytes

* Add comments to ByteBufferInputStream

* Remove unused declarations
2017-12-04 18:04:27 -08:00
Parag Jain 7c01f77b04 Parse Batch support (#5081)
* add parseBatch and deprecate parse method in InputRowParser

add addAll method, skip max rows in memory check for it

remove parse method from implemetations

transform transformers

add string multiplier input row parser

fix withParseSpec

fix kafka batch indexing

fix isPersistRequired

comments

* add unit test

* make persist async

* review comments
2017-12-04 16:06:16 -06:00
Roman Leventov aacc57131b Don't use HistoricalXxxPrototypes in PooledTopNAlgorithm when Cursor's offset is FilteredOffset (#5133)
* Don't use HistoricalXxxPrototypes in PooledTopNAlgorithm when Cursor's offset is FilteredOffset, because it doens't support cloning

* Add test
2017-12-01 17:04:22 -08:00
Gian Merlino 5f6bdd940b SQL: Improve translation of time floor expressions. (#5107)
* SQL: Improve translation of time floor expressions.

The main change is to TimeFloorOperatorConversion.applyTimestampFloor.

- Prefer timestamp_floor expressions to timeFormat extractionFns, to
  avoid turning things into strings when it isn't necessary.
- Collapse CAST(FLOOR(X TO Y) AS DATE) to FLOOR(X TO Y) if appropriate.

* Fix tests.
2017-11-29 12:06:03 -08:00
Chuanlei Ni 368d03146b assign granularity.all to SelectQuery by default (#5091) 2017-11-21 17:10:19 -08:00
zhangxinyu1 590633c595 fix bug in method reset of ByteBufferHashTable.java (#5100) 2017-11-17 13:19:16 -03:00
Jonathan Wei ec6774039e
Fix numLookupLoadingThreads default value (#5097) 2017-11-16 15:13:52 -08:00
Gian Merlino 77df5e0673 ExpressionSelectors: Add optimized selectors. (#5048)
* ExpressionSelectors: Add caching selectors.

- SingleLongInputCaching selector for expressions on the __time column,
  using a similar optimization to SingleScanTimeDimSelector
- SingleStringInputDimensionSelector for expressions on string columns
  that return strings, using a similar optimization to ExtractionFn
  based DimensionSelectors.
- SingleStringInputCaching selector for expressions on string columns
  that return primitives.

Also, in the SQL planner, prefer expressions for time operations
rather than extractionFns.

* Code review comments.
2017-11-13 20:24:24 -08:00
Akash Dwivedi c1538f29fc maxQueryTimeout property in runtime properties. (#4852)
* maxQueryTimeout property in runtime properties.

* extra line

* move withTimeoutAndMaxScatterGatherBytes method to QueryLifeCycle.

* Fix initialize method.

* remove unused import.

* doc update.

* some more details in doc about query failure..

* minor fix.

* decorating QueryRunner to set and verify context. Added by servers.

* remove whitespace.
2017-11-13 19:23:11 -06:00
Gian Merlino 9444da5038 SQL: Improved behavior when implicitly casting strings to date/time literals. (#5023)
* SQL: Improved behavior when implicitly casting strings to date/time literals.

- Handle all flavors of ISO8601 and SQL literals.
- Throw errors on other literals instead of silently transforming them to 0.

* Respect timeZone when format is null.
2017-11-10 17:43:22 +09:00
Roman Leventov 3541b7544b Prohibit and remove unused declarations in the processing module (#4930)
* Prohibit and remove unused declarations in the processing module

* Fix tests

* Fix integration tests

* Suppress unused

* Try to remove SuppressWarnings unused in VirtualColumn

* Remove reset 'false positives'

* Annotate CliCommandCreator as ExtensionPoint

* Unused import warning instead of error in IntelliJ

* Fixes

* Add comment

* Fix AzureBlob

* Fix CloudFilesBlob

* Address comments

* Add Project SDK section to INTELLIJ_SETUP.md

* Fix image
2017-11-09 09:27:27 -08:00
Roman Leventov a8dc056c09
Add retries for coordinator fetch and lookup start in LookupReferencesManager (#5029)
* Add retries for coordinator fetch and lookup start in LookupReferencesManager

* Fix LookupConfigTest

* Address comments

* Address more comments

* And address more comments

* Address comms

* Recognize 'not found' lookups in LookupReferencesManager.tryGetLookupListFromCoordinator(), by @egor-ryashin
2017-11-09 02:30:36 -03:00
Jihoon Son 5f3c863d5e Add compaction task (#4985)
* Add compaction task

* added doc

* use combining aggregators

* address comments

* add support for dimensionsSpec

* fix getUniqueDims and getUniqueMetics

* find unique dimensionsSpec

* fix compilation

* add unit test

* fix test

* fix test

* test for different dimension orderings and types, and doc for type and ordering

* add control for custom ordering and type

* update doc

* fix compile

* fix compile

* add segments param

* fix serde error

* fix build
2017-11-03 21:55:27 -06:00
Roman Leventov 5eb08c27cb Add Emitter monitoring (#4973)
* Add Emitter monitoring

* Fix typo

* Fixes

* testing new emitter

* Fix failed test (#71)

* testing new emitter

* fix on failed test

* Remove emitter's readTimeout from docs

* Update docs

* Add HttpEmittingMonitor

* Update java-util to 1.3.2
2017-11-03 21:27:57 -06:00
Gian Merlino 6c725a7e06 Fix havingSpec on complex aggregators. (#5024)
* Fix havingSpec on complex aggregators.

- Uses the technique from #4883 on DimFilterHavingSpec too.
- Also uses Transformers from #4890, necessitating a move of that and other
  related classes from druid-server to druid-processing. They probably make
  more sense there anyway.
- Adds a SQL query test.

Fixes #4957.

* Remove unused import.
2017-11-01 12:58:08 -04:00
Gian Merlino 1df458b35e Fix improper handling of empty arrays in StringDimensionIndexer. (#5012)
* Fix improper handling of empty arrays in StringDimensionIndexer.

This bug was able to introduce data errors: if the input rows to an
IncrementalIndex contained entirely empty arrays and single values, then
upon persisting to disk, the empty arrays would be replaced with the
lexicographically smallest single value, rather than nulls like they
should have been.

* Style fix.

* Add tests for bitmap indexes too.
2017-10-27 14:21:48 -07:00
Jihoon Son d7024f22e1 Upgrade fastutil to 8.1.0 (#4988)
* Upgrade failutil to 8.1.0

* unused import
2017-10-19 23:37:43 -05:00
Jihoon Son 52d7f74226 Add streaming aggregation as the last step of ConcurrentGrouper if data are spilled (#4704)
* Add steaming grouper

* Fix doc

* Use a single dictionary while combining

* Revert GroupByBenchmark

* Removed unused code

* More cleanup

* Remove unused config

* Fix some typos and bugs

* Refactor Groupers.mergeIterators()

* Add comments for combining tree

* Refactor buildCombineTree

* Refactor iterator

* Add ParallelCombiner

* Add ParallelCombinerTest

* Handle InterruptedException

* use AbstractPrioritizedCallable

* Address comments

* [maven-release-plugin] prepare release druid-0.11.0-sg

* [maven-release-plugin] prepare for next development iteration

* Address comments

* Revert "[maven-release-plugin] prepare for next development iteration"

This reverts commit 5c6b31e488.

* Revert "[maven-release-plugin] prepare release druid-0.11.0-sg"

This reverts commit 0f5c3a8b82.

* Fix build failure

* Change list to array

* rename sortableIds

* Address comments

* change to foreach loop

* Fix comment

* Revert keyEquals()

* Remove loop

* Address comments

* Fix build fail

* Address comments

* Remove unused imports

* Fix method name

* Split intermediate and leaf combine degrees

* Add comments to StreamingMergeSortedGrouper

* Add more comments and fix overflow

* Address comments

* ConcurrentGrouperTest cleanup

* add thread number configuration for parallel combining

* improve doc

* address comments

* fix build
2017-10-17 23:24:08 -07:00
Slim af2bc5f814 Make float default representation for DoubleSum/Min/Max aggregators (#4944)
* Introduce System wide property to select how to store double.
Set the default to store as float

Change-Id: Id85cca04ed0e7ecbce78624168c586dcc2adafaa

* fix tests

Change-Id: Ib42db724b8a8f032d204b58c366caaeabdd0d939

* Change the property name

Change-Id: I3ed69f79fc56e3735bc8f3a097f52a9f932b4734

* add tests and make default distribution store doubles as 64bits

Change-Id: I237b07829117ac61e247a6124423b03992f550f2

* adding mvn argument to parallel-test profile

Change-Id: Iae5d1328f901c4876b133894fa37e0d9a4162b05

* move property name and helper function to io.druid.segment.column.Column

Change-Id: I62ea903d332515de2b7ca45c02587a1b015cb065

* fix docs and clean style

Change-Id: I726abb8f52d25dc9dc62ad98814c5feda5e4d065

* fix docs

Change-Id: If10f4cf1e51a58285a301af4107ea17fe5e09b6d
2017-10-16 17:17:22 -07:00
Himanshu a7e802c9d4 greater-than/less-than/equal-to havingSpec to call AggregatorFactory.finalizeComputation(..) (#4883)
* greater-than/less-than/equal-to havingSpec to call AggregatorFactory.finalizeComputation(..)

* fix the unit test and expect having to work on hyperUnique agg

* test fix

* fix style errors
2017-10-16 12:02:30 -07:00
Roman Leventov dc7cb117a1 Refactor ColumnSelectorFactory; Rely on ColumnValueSelector's polymorphism (#4886)
* Refactor ColumnSelectorFactory; Rely on ColumnValueSelector's polymorphism

* Fix MapVirtualColumn.makeColumnValueSelector()

* Minor fixes

* Fix IndexGeneratorCombinerTest

* DimensionSelector to return zeros when treated as numeric ColumnValueSelector

* Fix IncrementalIndexTest

* Fix IncrementalIndex.makeColumnSelectorFactory()

* Optimize MapBasedRow.getMetric()

* Fix VarianceAggregatorTest

* Simplify IncrementalIndex.makeColumnSelectorFactory()

* Address comments

* More comments

* Test
2017-10-13 21:44:17 -05:00