Commit Graph

1911 Commits

Author SHA1 Message Date
Senthil Kumar L S 371c672828 Adding feature thetaSketchConstant to do some set operation in PostAgg (#5551)
* Adding feature thetaSketchConstant to do some set operation in PostAggregator

* Updated review comments for PR #5551 - Adding thetaSketchConstant

* Fixed CI build issue

* Updated review comments 2 for PR #5551 - Adding thetaSketchConstant
2018-04-05 22:56:59 -07:00
Niketh Sabbineni 270fd1ea15 Allow getDomain to return disjointed intervals (#5570)
* Allow getDomain to return disjointed intervals

* Indentation issues
2018-04-05 22:12:30 -07:00
Jonathan Wei 969342cd28
More error reporting and stats for ingestion tasks (#5418)
* Add more indexing task status and error reporting

* PR comments, add support in AppenderatorDriverRealtimeIndexTask

* Use TaskReport instead of metrics/context

* Fix tests

* Use TaskReport uploads

* Refactor fire department metrics retrieval

* Refactor input row serde in hadoop task

* Refactor hadoop task loader names

* Truncate error message in TaskStatus, add errorMsg to task report

* PR comments
2018-04-05 21:38:57 -07:00
Kirill Kozlov 8878a7ff94 Replace guava Charsets with native java StandardCharsets (#5545) 2018-03-28 21:00:08 -07:00
Niketh Sabbineni 912adcc284 ArrayAggregation: Use long to avoid overflow (#5544)
* ArrayAggregation: Use long to avoid overflow

* Add Tests
2018-03-28 16:37:53 -07:00
Jihoon Son 024e0a9cca Respect forceHashAggregation in queryContext (#5533)
* Respect forceHashAggregation in queryContext

* address comment
2018-03-28 14:15:38 -07:00
Atul Mohan ec17a44e09 Add result level caching to Brokers (#5028)
* Add result level caching to Brokers

* Minor doc changes

* Simplify sequences

*  Move etag execution

* Modify cacheLimit criteria

* Fix incorrect etag computation

* Fix docs

* Add separate query runner for result level caching

* Update docs

* Add post aggregated results to result level cache

* Fix indents

* Check byte size for exceeding cache limit

* Fix indents

* Fix indents

* Add flag for result caching

* Remove logs

* Make cache object generation synchronous

* Avoid saving intermediate cache results to list

* Fix changes that handle etag based response

* Release bytestream after use

*  Address PR comments

*  Discard resultcache stream after use

* Fix docs

* Address comments

* Add comment about fluent workflow issue
2018-03-23 19:11:52 -07:00
Jihoon Son 1ad898bde2
Use the official aws-sdk instead of jet3t (#5382)
* Use the official aws-sdk instead of jet3t

* fix compile and serde tests

* address comments and fix test

* add http version string

* remove redundant dependencies, fix potential NPE, and fix test

* resolve TODOs

* fix build

* downgrade jackson version to 2.6.7

* fix test

* resolve the last TODO

* support proxy and endpoint configurations

* fix build

* remove debugging log

* downgrade hadoop version to 2.8.3

* fix tests

* remove unused log

* fix it test

* revert KerberosAuthenticator change

* change hadoop-aws scope to provided in hdfs-storage

* address comments

* address comments
2018-03-21 15:36:54 -07:00
Clint Wylie 885b975c95 fix LongsColumnWithNulls and FloatsColumnWithNulls to override isNull in order to actually use nullValueBitmap (#5510) 2018-03-20 16:04:08 -07:00
Charles Allen 58f110f7f8 Future-proof some Guava usage (#5414)
* Future-proof some Guava usage

* Use a java-util EmptyIterator instead of Guava's
* Change some of the guava future handling to do manual async
transforms. Guava changes transform into transformAsync by deprecating
transform in ONLY Guava 19. Then its gone in 20

* Use `Collections.emptyIterator()`

* Pretty formatting

* Make listenable future transforms a thing in default druid

* Format fix

* Add forbidden guava apis

* Make the ListenableFutrues.transformAsync have comments

* Undo intellij bad pattern matching in comments

* Futrues --> Futures

* Add empty iterators forbidding

* Fix extra `A`

* Correct method signature

* Address review comments

* Finish Gian review comments

* Proper syntax from https://github.com/policeman-tools/forbidden-apis/wiki/SignaturesSyntax
2018-03-20 08:59:33 -07:00
Slim 17c71a2a60
Make Doubles aggregators use 64bits by default (#5478)
* use 64-bit float representation for double based aggregator

Change-Id: Ia4f442037052add178f6ac68138c9d52f96c6e09

* review comments

Change-Id: I5a588f7364f236bf22f2b138e9d743bfb27c67fe
2018-03-19 19:13:04 -07:00
Roman Leventov 693e3575f9
Remove unused code and exception declarations (#5461)
* Remove unused code and exception declarations

* Address comments

* Remove redundant Exception declarations

* Make FirehoseFactoryV2.connect() to throw IOException again
2018-03-16 22:11:12 +01:00
Samarth Jain afa25202a3 Segment filtering should be done by looking at the inner most query o… (#5496)
* Segment filtering should be done by looking at the inner most query of a nested query

* Fixing checkstyle errors

* Addressing code review comments
2018-03-16 14:05:14 -07:00
Gian Merlino a08efe4683
Fix round robining in router. (#5500)
* Fix round robining in router.

Say that ten times fast.

For query endpoints, AsyncQueryForwardingServlet called hostFinder.getDefaultServer()
to set a default server, followed by hostFinder.getServer(inputQuery) to override it
with query-specific routing. Since hostFinder is round-robin, this skips a server.
When there are only two servers, one server is _always_ skipped and the router sends
all queries to the same broker.

* Adjust spacing.
2018-03-15 18:45:59 -07:00
Gian Merlino 16b81fcd53
SegmentMetadataQuery: Fix default interval handling. (#5489)
* SegmentMetadataQuery: Fix default interval handling.

PR #4131 introduced a new copy builder for segmentMetadata that did
not retain the value of usingDefaultInterval. This led to it being
dropped and the default-interval handling not working as expected.
Instead of using the default 1 week history when intervals are not
provided, the segmentMetadata query would query _all_ segments,
incurring an unexpected performance hit.

This patch fixes the bug and adds a test for the copy builder.

* Intervals
2018-03-15 10:05:46 -07:00
Niketh Sabbineni 40cc2c8740 Query should not fail because emitter fails or throws Exception (#5484) 2018-03-13 19:57:05 -07:00
Roman Leventov 6b158abe3f Enforce optimal IndexedInts iteration (#5456)
* Enforce optimal IndexedInts iteration

* Fix remaining suboptimal usages
2018-03-09 09:42:40 -08:00
Niraja Mishra ba3dbf2a42 Fixed NPE when dimension is null or empty. https://github.com/druid-io/druid/issues/3007 (#5299) 2018-03-05 16:27:35 -08:00
Gian Merlino 7416d1d02d Add "joda" option to timeFormat extractionFn. (#5448) 2018-03-02 19:59:26 -08:00
Jonathan Wei cf5f74b013 Fix GroupBy limit push down descending sorting on numeric columns (#5453) 2018-03-01 18:43:45 -08:00
Gian Merlino e4eaee3806
Support for disabling bitmap indexes. (#5402)
* Support for disabling bitmap indexes.

Can save space for columns where bitmap indexes are pointless (like
free-form text).

* Remove import.

* Fix CompactionTaskTest.

* Update for review comments.

* Review comments, tests.

* Fix test.
2018-02-28 19:19:56 -08:00
Niraja Mishra 0f009a41e1 Fixed PeriodGranularity for Asia pacific timezones (#5410) 2018-02-27 10:39:50 -08:00
Nishant Bangarwa 219e77aeac SQL compatible Null Handling Part - Expressions and Storage Changes (#5278)
* SQL compatible Null Handling Part - Expressions, Storage and Dimension Selector Changes

fix travis strict compilation

* fix teamcity error - remove unused method

* review comments

* review comments

* more comments

* review comments

* review comments

* Optimize isNull method

* Optimize isNull in ColumnarFloats/Longs/Doubles

* review comment - separate classes for null and non-null columns

fix intellij inspection

* remove unused import

* More Review comments

* improve comment

* More review comments

* fix checkstyle

* more review comments

* review comments.

fix javadoc links

remove Nullable from ConstantColumnValueSelector

* review comments.

* satisfy teamcity inspections
2018-02-21 13:27:26 +01:00
Jihoon Son deeda0dff2 Fix DefaultLimitSpec to respect sortByDimsFirst (#5385)
* Fix DefaultLimitSpec to respect sortByDimsFirst

* fix bug

* address comment
2018-02-16 15:26:32 -08:00
Roman Leventov e64ffb10c2 Standartize on using Integer.BYTES instead of Ints.BYTES from Guava, same for other primitives (#5366) 2018-02-07 13:24:30 -08:00
Gian Merlino 971d45ab3f Use a separate snapshot file per lookup tier. (#5358)
Prevents conflicts if two processes on the same machine use the
same lookup snapshot directory but are in different tiers.
2018-02-07 11:28:53 -08:00
Gian Merlino e255d66b85
Fix two improper casts in HavingSpecMetricComparator. (#5352)
* Fix two improper casts in HavingSpecMetricComparator.

Fixes two things:

1. An improper double-to-long cast when comparing double metrics to any
   kind of value, which was a regression from #4883.
2. An improper double-to-long cast when comparing a long/int metric to a
   double/float value: the value was cast to long/int, drawing strange
   conclusions like int 100 matching a havingSpec of equalTo(100.5).

* Add comments.

* Remove extraneous comment.

* Simplify code a bit.
2018-02-06 13:18:55 -08:00
Gian Merlino c21ff6e81c
Properly set "identity" in query metrics. (#5330)
* Properly set "identity" in query metrics.

This patch adds an "identity" field to QueryPlus and sets it in
QueryLifecycle when the query starts executing. This is important
because it allows it to be used for future QueryMetrics created
by that QueryPlus object.

We also add "identity" to the request-level QueryMetrics object
created in emitLogsAndMetrics.

* Remove unused method.
2018-02-06 10:53:00 -08:00
Gian Merlino 8c738c7076 Fix races in LookupSnapshotTaker, CoordinatorPollingBasicAuthenticatorCacheManager (#5344)
* Fix races in LookupSnapshotTaker, CoordinatorPollingBasicAuthenticatorCacheManager.

Both were susceptible to the following conditions:

1. Two JVMs on the same machine (perhaps two peons) could conflict by one reading while the
   other was writing, or by writing to the file at the same time.
2. One JVM could partially write a file, then crash, leaving a truncated file.

* Use StringUtils.format
2018-02-06 09:44:06 -08:00
Slim 37c09ce3f8 Use both Joad Ids and Java IDs as Timezone to string readers (#5349)
* Use both Joad Ids and Java IDs as Timezone to string readers

Change-Id: Ieb5c18559879f3f3a0104912ce2f0a354ad0aac3

* move the function to DateTimes and add org.joda.time.DateTimeZone#forID as part of forbidden api

Change-Id: Iff97fa044758019ed0c231587d10e31a9cc18da0

* exclude class and remove other usage

Change-Id: Ib458c2caaa1865535767e1009fbf017a92c8f615

* remove it from test classes

Change-Id: I9b576324f6c7e17a74bd8b13879232c9a8cd40b4

* remove unused

Change-Id: If1c5b70c26c2b7c83c20434cb72b2060653f5052
2018-02-06 16:34:11 +05:30
Gian Merlino 9a62b02cb7 Extensions: Option to load classes from extension jars first. (#5321)
The behavior is configurable through druid.extensions.useExtensionClassloaderFirst.
It is useful when extensions want to load a dependency different from one provided
by Druid, for example a different version of geoip or protobuf.
2018-02-06 16:14:03 +05:30
Jonathan Wei 285dedd126 More ParseException handling for numeric dimensions (#5312)
* Discard rows with unparseable numeric dimensions

* PR comments

* Don't throw away entire row on parse exception

* PR comments

* Fix import
2018-02-05 21:43:35 -08:00
Gian Merlino 7e02408510 Update versions to 0.13.0-SNAPSHOT. (#5323) 2018-02-02 12:06:38 -06:00
Himanshu 4cd47de62f add LookupExtractorFactory.destroy() method (#5287)
* add LookupExtractorFactory.destroy() method

* fix LookupReferencesManagerTest
2018-02-01 22:56:09 -08:00
Gian Merlino ed47a1e1a9
Lookups: Inherit "injective" from registered lookups, improve docs. (#5316)
Code changes:
- In the lookup-based extractionFns, inherit injective property from
  the lookup itself if not specified.

Doc changes:
- Add a "Query execution" section to the lookups doc explaining how
  injective lookups and their optimizations work.
- Remove scary warnings against using registeredLookup extractionFns.
  They are necessary and important since they work with filters and
  function cascades -- two things that the dimension specs do not do.
  They deserve to be first class citizens.
- Move the "registeredLookup" fn above the "lookup" fn. It's probably
  more commonly used, so the docs read better this way.
2018-02-01 18:30:19 -08:00
Jonathan Wei 80419752b5 Add metamx emitter, http clients, and metrics packages to druid java-util (#5289)
* Add metamx java-util emitter, http clients, and metrics packages to druid java-util

* Remove metamx java-util from pom.xml files

* Checkstyle fixes

* Import fix

* TeamCity inspection fixes

* Use slf4j, move some version defs to master pom.xml

* Use parent jvm-attach-api and maven-surefire-plugin versions

* Add ] to log msg, suppress inspection
2018-01-24 22:10:36 +01:00
Roman Leventov 61e6878afd Check Javadoc reference integrity (#5279) 2018-01-22 13:51:28 -08:00
Roman Leventov a346bbc6f3 Enforce spacing around foreach colon with Checkstyle (#5271) 2018-01-22 11:48:51 -08:00
Roman Leventov f99c27e9e0 Fix bugs in ImmutableRTree; Merge bytebuffer-collections module into druid-processing (#5275)
* Fix bugs in ImmutableRTree; optimize ImmmutableRTreeObjectStrategy.writeTo(); Merge bytebuffer-collections module into druid-processing

* Remove unused declaration

* Fix another bug
2018-01-23 00:49:59 +05:30
Roman Leventov 87c744ac1d Add MethodParamPad, OneStatementPerLine and EmptyStatement Checkstyle checks (#5272) 2018-01-18 11:29:23 -08:00
Roman Leventov ad6cdf5d09
Reuse IndexedInts returned from DimensionSelector.getRow() implementations (#5172)
* Reuse IndexedInts in DimensionSelector implementations

* Remove BaseObjectColumnValueSelector.getObject() doc

* typo
2018-01-17 16:01:26 +01:00
Clint Wylie 491f8cca81 fix timewarp query results when using timezones and crossing DST transitions (#5157)
* timewarp and timezones
changes:
* `TimewarpOperator` will now compensate for daylight savings time shifts between date translation ranges for queries using a `PeriodGranularity` with a timezone defined
* introduces a new abstract query type `TimeBucketedQuery` for all queries which have a `Granularity` (100% not attached to this name). `GroupByQuery`, `SearchQuery`, `SelectQuery`, `TimeseriesQuery`, and `TopNQuery` all extend `TimeBucke
tedQuery`, cutting down on some duplicate code and providing a mechanism for `TimewarpOperator` (and anything else) that needs to be aware of granularity

* move precondition check to TimeBucketedQuery, add Granularities.nullToAll, add getTimezone to TimeBucketQuery

* formatting

* more formatting

* unused import

* changes:
* add 'getGranularity' and 'getTimezone' to 'Query' interface
* merge 'TimeBucketedQuery' into 'BaseQuery'
* fixup tests from resulting serialization changes

* dedupe

* fix after merge

* suppress warning
2018-01-11 12:39:33 -08:00
Roman Leventov 8877ce38d6
Enforce modifier order with Checkstyle (#5246) 2018-01-11 09:50:42 +01:00
Roman Leventov 535ec437e9 Apply 'power of 2' optimization to BlockLayoutIndexedDoubleSupplier (#5176)
* Apply 'power of 2' optimization to BlockLayoutIndexedDoubleSupplier; slight optimization of buffer.get() in block layout indexed suppliers

* Fix byte order
2018-01-05 16:08:07 +09:00
Jonathan Wei 935ac646f4
Upgrade to Calcite 1.15.0 (#5210)
* Upgrade to Calcite 1.15.0

* Use Filtration.eternity()
2018-01-04 12:11:24 -08:00
Roman Leventov 579f9fbedf Add IndexedInts.debugToString() and AbstractIndex.toString(); Add Sequence.toList() and limit() (#5175)
* Add IndexedInts.debugToString() and AbstractIndex.toString()

* Fix AppenderatorTest
2018-01-04 09:56:47 +09:00
Roman Leventov dc87e4fda1 Renamed IndexedFloats/Doubles/Longs to ColumnarFloats/..., IndexedMultivalue to ColumnarMultiInts, separate IndexedInts from ColumnarInts, many other renames for consistency in io.druid.segment.data package (#5171) 2017-12-20 18:50:07 -08:00
Clint Wylie 1181411901 small optimization in timeseries if 'skipEmptyBuckets' is true and cursor completed (#5178) 2017-12-19 16:47:00 -06:00
Roman Leventov f18eba50ee Remove Aggregator.reset() (#5177) 2017-12-19 14:09:17 -08:00
Roman Leventov 5787d04fad Bump Druid version to 0.12.0 (#5138) 2017-12-15 07:37:01 -08:00