1910 Commits

Author SHA1 Message Date
Niketh Sabbineni
270fd1ea15 Allow getDomain to return disjointed intervals (#5570)
* Allow getDomain to return disjointed intervals

* Indentation issues
2018-04-05 22:12:30 -07:00
Jonathan Wei
969342cd28
More error reporting and stats for ingestion tasks (#5418)
* Add more indexing task status and error reporting

* PR comments, add support in AppenderatorDriverRealtimeIndexTask

* Use TaskReport instead of metrics/context

* Fix tests

* Use TaskReport uploads

* Refactor fire department metrics retrieval

* Refactor input row serde in hadoop task

* Refactor hadoop task loader names

* Truncate error message in TaskStatus, add errorMsg to task report

* PR comments
2018-04-05 21:38:57 -07:00
Kirill Kozlov
8878a7ff94 Replace guava Charsets with native java StandardCharsets (#5545) 2018-03-28 21:00:08 -07:00
Niketh Sabbineni
912adcc284 ArrayAggregation: Use long to avoid overflow (#5544)
* ArrayAggregation: Use long to avoid overflow

* Add Tests
2018-03-28 16:37:53 -07:00
Jihoon Son
024e0a9cca Respect forceHashAggregation in queryContext (#5533)
* Respect forceHashAggregation in queryContext

* address comment
2018-03-28 14:15:38 -07:00
Atul Mohan
ec17a44e09 Add result level caching to Brokers (#5028)
* Add result level caching to Brokers

* Minor doc changes

* Simplify sequences

*  Move etag execution

* Modify cacheLimit criteria

* Fix incorrect etag computation

* Fix docs

* Add separate query runner for result level caching

* Update docs

* Add post aggregated results to result level cache

* Fix indents

* Check byte size for exceeding cache limit

* Fix indents

* Fix indents

* Add flag for result caching

* Remove logs

* Make cache object generation synchronous

* Avoid saving intermediate cache results to list

* Fix changes that handle etag based response

* Release bytestream after use

*  Address PR comments

*  Discard resultcache stream after use

* Fix docs

* Address comments

* Add comment about fluent workflow issue
2018-03-23 19:11:52 -07:00
Jihoon Son
1ad898bde2
Use the official aws-sdk instead of jet3t (#5382)
* Use the official aws-sdk instead of jet3t

* fix compile and serde tests

* address comments and fix test

* add http version string

* remove redundant dependencies, fix potential NPE, and fix test

* resolve TODOs

* fix build

* downgrade jackson version to 2.6.7

* fix test

* resolve the last TODO

* support proxy and endpoint configurations

* fix build

* remove debugging log

* downgrade hadoop version to 2.8.3

* fix tests

* remove unused log

* fix it test

* revert KerberosAuthenticator change

* change hadoop-aws scope to provided in hdfs-storage

* address comments

* address comments
2018-03-21 15:36:54 -07:00
Clint Wylie
885b975c95 fix LongsColumnWithNulls and FloatsColumnWithNulls to override isNull in order to actually use nullValueBitmap (#5510) 2018-03-20 16:04:08 -07:00
Charles Allen
58f110f7f8 Future-proof some Guava usage (#5414)
* Future-proof some Guava usage

* Use a java-util EmptyIterator instead of Guava's
* Change some of the guava future handling to do manual async
transforms. Guava changes transform into transformAsync by deprecating
transform in ONLY Guava 19. Then its gone in 20

* Use `Collections.emptyIterator()`

* Pretty formatting

* Make listenable future transforms a thing in default druid

* Format fix

* Add forbidden guava apis

* Make the ListenableFutrues.transformAsync have comments

* Undo intellij bad pattern matching in comments

* Futrues --> Futures

* Add empty iterators forbidding

* Fix extra `A`

* Correct method signature

* Address review comments

* Finish Gian review comments

* Proper syntax from https://github.com/policeman-tools/forbidden-apis/wiki/SignaturesSyntax
2018-03-20 08:59:33 -07:00
Slim
17c71a2a60
Make Doubles aggregators use 64bits by default (#5478)
* use 64-bit float representation for double based aggregator

Change-Id: Ia4f442037052add178f6ac68138c9d52f96c6e09

* review comments

Change-Id: I5a588f7364f236bf22f2b138e9d743bfb27c67fe
2018-03-19 19:13:04 -07:00
Roman Leventov
693e3575f9
Remove unused code and exception declarations (#5461)
* Remove unused code and exception declarations

* Address comments

* Remove redundant Exception declarations

* Make FirehoseFactoryV2.connect() to throw IOException again
2018-03-16 22:11:12 +01:00
Samarth Jain
afa25202a3 Segment filtering should be done by looking at the inner most query o… (#5496)
* Segment filtering should be done by looking at the inner most query of a nested query

* Fixing checkstyle errors

* Addressing code review comments
2018-03-16 14:05:14 -07:00
Gian Merlino
a08efe4683
Fix round robining in router. (#5500)
* Fix round robining in router.

Say that ten times fast.

For query endpoints, AsyncQueryForwardingServlet called hostFinder.getDefaultServer()
to set a default server, followed by hostFinder.getServer(inputQuery) to override it
with query-specific routing. Since hostFinder is round-robin, this skips a server.
When there are only two servers, one server is _always_ skipped and the router sends
all queries to the same broker.

* Adjust spacing.
2018-03-15 18:45:59 -07:00
Gian Merlino
16b81fcd53
SegmentMetadataQuery: Fix default interval handling. (#5489)
* SegmentMetadataQuery: Fix default interval handling.

PR #4131 introduced a new copy builder for segmentMetadata that did
not retain the value of usingDefaultInterval. This led to it being
dropped and the default-interval handling not working as expected.
Instead of using the default 1 week history when intervals are not
provided, the segmentMetadata query would query _all_ segments,
incurring an unexpected performance hit.

This patch fixes the bug and adds a test for the copy builder.

* Intervals
2018-03-15 10:05:46 -07:00
Niketh Sabbineni
40cc2c8740 Query should not fail because emitter fails or throws Exception (#5484) 2018-03-13 19:57:05 -07:00
Roman Leventov
6b158abe3f Enforce optimal IndexedInts iteration (#5456)
* Enforce optimal IndexedInts iteration

* Fix remaining suboptimal usages
2018-03-09 09:42:40 -08:00
Niraja Mishra
ba3dbf2a42 Fixed NPE when dimension is null or empty. https://github.com/druid-io/druid/issues/3007 (#5299) 2018-03-05 16:27:35 -08:00
Gian Merlino
7416d1d02d Add "joda" option to timeFormat extractionFn. (#5448) 2018-03-02 19:59:26 -08:00
Jonathan Wei
cf5f74b013 Fix GroupBy limit push down descending sorting on numeric columns (#5453) 2018-03-01 18:43:45 -08:00
Gian Merlino
e4eaee3806
Support for disabling bitmap indexes. (#5402)
* Support for disabling bitmap indexes.

Can save space for columns where bitmap indexes are pointless (like
free-form text).

* Remove import.

* Fix CompactionTaskTest.

* Update for review comments.

* Review comments, tests.

* Fix test.
2018-02-28 19:19:56 -08:00
Niraja Mishra
0f009a41e1 Fixed PeriodGranularity for Asia pacific timezones (#5410) 2018-02-27 10:39:50 -08:00
Nishant Bangarwa
219e77aeac SQL compatible Null Handling Part - Expressions and Storage Changes (#5278)
* SQL compatible Null Handling Part - Expressions, Storage and Dimension Selector Changes

fix travis strict compilation

* fix teamcity error - remove unused method

* review comments

* review comments

* more comments

* review comments

* review comments

* Optimize isNull method

* Optimize isNull in ColumnarFloats/Longs/Doubles

* review comment - separate classes for null and non-null columns

fix intellij inspection

* remove unused import

* More Review comments

* improve comment

* More review comments

* fix checkstyle

* more review comments

* review comments.

fix javadoc links

remove Nullable from ConstantColumnValueSelector

* review comments.

* satisfy teamcity inspections
2018-02-21 13:27:26 +01:00
Jihoon Son
deeda0dff2 Fix DefaultLimitSpec to respect sortByDimsFirst (#5385)
* Fix DefaultLimitSpec to respect sortByDimsFirst

* fix bug

* address comment
2018-02-16 15:26:32 -08:00
Roman Leventov
e64ffb10c2 Standartize on using Integer.BYTES instead of Ints.BYTES from Guava, same for other primitives (#5366) 2018-02-07 13:24:30 -08:00
Gian Merlino
971d45ab3f Use a separate snapshot file per lookup tier. (#5358)
Prevents conflicts if two processes on the same machine use the
same lookup snapshot directory but are in different tiers.
2018-02-07 11:28:53 -08:00
Gian Merlino
e255d66b85
Fix two improper casts in HavingSpecMetricComparator. (#5352)
* Fix two improper casts in HavingSpecMetricComparator.

Fixes two things:

1. An improper double-to-long cast when comparing double metrics to any
   kind of value, which was a regression from #4883.
2. An improper double-to-long cast when comparing a long/int metric to a
   double/float value: the value was cast to long/int, drawing strange
   conclusions like int 100 matching a havingSpec of equalTo(100.5).

* Add comments.

* Remove extraneous comment.

* Simplify code a bit.
2018-02-06 13:18:55 -08:00
Gian Merlino
c21ff6e81c
Properly set "identity" in query metrics. (#5330)
* Properly set "identity" in query metrics.

This patch adds an "identity" field to QueryPlus and sets it in
QueryLifecycle when the query starts executing. This is important
because it allows it to be used for future QueryMetrics created
by that QueryPlus object.

We also add "identity" to the request-level QueryMetrics object
created in emitLogsAndMetrics.

* Remove unused method.
2018-02-06 10:53:00 -08:00
Gian Merlino
8c738c7076 Fix races in LookupSnapshotTaker, CoordinatorPollingBasicAuthenticatorCacheManager (#5344)
* Fix races in LookupSnapshotTaker, CoordinatorPollingBasicAuthenticatorCacheManager.

Both were susceptible to the following conditions:

1. Two JVMs on the same machine (perhaps two peons) could conflict by one reading while the
   other was writing, or by writing to the file at the same time.
2. One JVM could partially write a file, then crash, leaving a truncated file.

* Use StringUtils.format
2018-02-06 09:44:06 -08:00
Slim
37c09ce3f8 Use both Joad Ids and Java IDs as Timezone to string readers (#5349)
* Use both Joad Ids and Java IDs as Timezone to string readers

Change-Id: Ieb5c18559879f3f3a0104912ce2f0a354ad0aac3

* move the function to DateTimes and add org.joda.time.DateTimeZone#forID as part of forbidden api

Change-Id: Iff97fa044758019ed0c231587d10e31a9cc18da0

* exclude class and remove other usage

Change-Id: Ib458c2caaa1865535767e1009fbf017a92c8f615

* remove it from test classes

Change-Id: I9b576324f6c7e17a74bd8b13879232c9a8cd40b4

* remove unused

Change-Id: If1c5b70c26c2b7c83c20434cb72b2060653f5052
2018-02-06 16:34:11 +05:30
Gian Merlino
9a62b02cb7 Extensions: Option to load classes from extension jars first. (#5321)
The behavior is configurable through druid.extensions.useExtensionClassloaderFirst.
It is useful when extensions want to load a dependency different from one provided
by Druid, for example a different version of geoip or protobuf.
2018-02-06 16:14:03 +05:30
Jonathan Wei
285dedd126 More ParseException handling for numeric dimensions (#5312)
* Discard rows with unparseable numeric dimensions

* PR comments

* Don't throw away entire row on parse exception

* PR comments

* Fix import
2018-02-05 21:43:35 -08:00
Gian Merlino
7e02408510 Update versions to 0.13.0-SNAPSHOT. (#5323) 2018-02-02 12:06:38 -06:00
Himanshu
4cd47de62f add LookupExtractorFactory.destroy() method (#5287)
* add LookupExtractorFactory.destroy() method

* fix LookupReferencesManagerTest
2018-02-01 22:56:09 -08:00
Gian Merlino
ed47a1e1a9
Lookups: Inherit "injective" from registered lookups, improve docs. (#5316)
Code changes:
- In the lookup-based extractionFns, inherit injective property from
  the lookup itself if not specified.

Doc changes:
- Add a "Query execution" section to the lookups doc explaining how
  injective lookups and their optimizations work.
- Remove scary warnings against using registeredLookup extractionFns.
  They are necessary and important since they work with filters and
  function cascades -- two things that the dimension specs do not do.
  They deserve to be first class citizens.
- Move the "registeredLookup" fn above the "lookup" fn. It's probably
  more commonly used, so the docs read better this way.
2018-02-01 18:30:19 -08:00
Jonathan Wei
80419752b5 Add metamx emitter, http clients, and metrics packages to druid java-util (#5289)
* Add metamx java-util emitter, http clients, and metrics packages to druid java-util

* Remove metamx java-util from pom.xml files

* Checkstyle fixes

* Import fix

* TeamCity inspection fixes

* Use slf4j, move some version defs to master pom.xml

* Use parent jvm-attach-api and maven-surefire-plugin versions

* Add ] to log msg, suppress inspection
2018-01-24 22:10:36 +01:00
Roman Leventov
61e6878afd Check Javadoc reference integrity (#5279) 2018-01-22 13:51:28 -08:00
Roman Leventov
a346bbc6f3 Enforce spacing around foreach colon with Checkstyle (#5271) 2018-01-22 11:48:51 -08:00
Roman Leventov
f99c27e9e0 Fix bugs in ImmutableRTree; Merge bytebuffer-collections module into druid-processing (#5275)
* Fix bugs in ImmutableRTree; optimize ImmmutableRTreeObjectStrategy.writeTo(); Merge bytebuffer-collections module into druid-processing

* Remove unused declaration

* Fix another bug
2018-01-23 00:49:59 +05:30
Roman Leventov
87c744ac1d Add MethodParamPad, OneStatementPerLine and EmptyStatement Checkstyle checks (#5272) 2018-01-18 11:29:23 -08:00
Roman Leventov
ad6cdf5d09
Reuse IndexedInts returned from DimensionSelector.getRow() implementations (#5172)
* Reuse IndexedInts in DimensionSelector implementations

* Remove BaseObjectColumnValueSelector.getObject() doc

* typo
2018-01-17 16:01:26 +01:00
Clint Wylie
491f8cca81 fix timewarp query results when using timezones and crossing DST transitions (#5157)
* timewarp and timezones
changes:
* `TimewarpOperator` will now compensate for daylight savings time shifts between date translation ranges for queries using a `PeriodGranularity` with a timezone defined
* introduces a new abstract query type `TimeBucketedQuery` for all queries which have a `Granularity` (100% not attached to this name). `GroupByQuery`, `SearchQuery`, `SelectQuery`, `TimeseriesQuery`, and `TopNQuery` all extend `TimeBucke
tedQuery`, cutting down on some duplicate code and providing a mechanism for `TimewarpOperator` (and anything else) that needs to be aware of granularity

* move precondition check to TimeBucketedQuery, add Granularities.nullToAll, add getTimezone to TimeBucketQuery

* formatting

* more formatting

* unused import

* changes:
* add 'getGranularity' and 'getTimezone' to 'Query' interface
* merge 'TimeBucketedQuery' into 'BaseQuery'
* fixup tests from resulting serialization changes

* dedupe

* fix after merge

* suppress warning
2018-01-11 12:39:33 -08:00
Roman Leventov
8877ce38d6
Enforce modifier order with Checkstyle (#5246) 2018-01-11 09:50:42 +01:00
Roman Leventov
535ec437e9 Apply 'power of 2' optimization to BlockLayoutIndexedDoubleSupplier (#5176)
* Apply 'power of 2' optimization to BlockLayoutIndexedDoubleSupplier; slight optimization of buffer.get() in block layout indexed suppliers

* Fix byte order
2018-01-05 16:08:07 +09:00
Jonathan Wei
935ac646f4
Upgrade to Calcite 1.15.0 (#5210)
* Upgrade to Calcite 1.15.0

* Use Filtration.eternity()
2018-01-04 12:11:24 -08:00
Roman Leventov
579f9fbedf Add IndexedInts.debugToString() and AbstractIndex.toString(); Add Sequence.toList() and limit() (#5175)
* Add IndexedInts.debugToString() and AbstractIndex.toString()

* Fix AppenderatorTest
2018-01-04 09:56:47 +09:00
Roman Leventov
dc87e4fda1 Renamed IndexedFloats/Doubles/Longs to ColumnarFloats/..., IndexedMultivalue to ColumnarMultiInts, separate IndexedInts from ColumnarInts, many other renames for consistency in io.druid.segment.data package (#5171) 2017-12-20 18:50:07 -08:00
Clint Wylie
1181411901 small optimization in timeseries if 'skipEmptyBuckets' is true and cursor completed (#5178) 2017-12-19 16:47:00 -06:00
Roman Leventov
f18eba50ee Remove Aggregator.reset() (#5177) 2017-12-19 14:09:17 -08:00
Roman Leventov
5787d04fad Bump Druid version to 0.12.0 (#5138) 2017-12-15 07:37:01 -08:00
Roman Leventov
64848c7ebf DataSegment memory optimizations (#5094)
* Deduplicate DataSegments contents (loadSpec's keys, dimensions and metrics lists as a whole) more aggressively; use ArrayMap instead of default LinkedHashMap for DataSegment.loadSpec, because they have only 3 entries on average; prune DataSegment.loadSpec on brokers

* Fix DataSegmentTest

* Refinements

* Try to fix

* Fix the second DataSegmentTest

* Nullability

* Fix tests

* Fix tests, unify to use TestHelper.getJsonMapper()

* Revert TestUtil as ServerTestHelper, fix tests

* Add newline

* Fix indexing tests

* Fix s3 tests

* Try to fix tests, remove lazy caching of ObjectMapper in TestHelper, rename TestHelper.getJsonMapper() to makeJsonMapper()

* Fix HDFS tests

* Fix HdfsDataSegmentPusherTest

* Capitalize constant names
2017-12-12 11:41:40 -08:00