Commit Graph

161 Commits

Author SHA1 Message Date
Gian Merlino 04ea3c9f8c
Update license headers. (#5976)
* Update license headers.

For compliance with http://www.apache.org/legal/src-headers.html.

* More license adjustments.

* Fix mistakenly edited package line.
2018-07-11 09:55:18 -07:00
Gian Merlino 0ae4aba4e2 HdfsDataSegmentPusher: Close tmpIndexFile before copying it. (#5873)
It seems that copy-before-close works OK on HDFS, but it doesn't work
on all filesystems. In particular, we observed this not working properly
with Google Cloud Storage. And anyway, it's better hygiene to close files
before attempting to copy them somewhere else.
2018-06-12 08:58:48 +01:00
David Lim 8ec2d2fe18 Use unique segment paths for Kafka indexing (#5692)
* support unique segment file paths

* forbiddenapis

* code review changes

* code review changes

* code review changes

* checkstyle fix
2018-04-29 21:59:48 -07:00
Jonathan Wei 513fab77d9 Lazy init of fullyQualifiedStorageDirectory in HDFS pusher (#5684)
* Lazy init of fullyQualifiedStorageDirectory in HDFS pusher

* Comment

* Fix test

* PR comments
2018-04-28 21:07:39 -07:00
Jonathan Wei 723f7ac550
Add support for task reports, upload reports to deep storage (#5524)
* Add support for task reports, upload reports to deep storage

* PR comments

* Better name for method

* Fix report file upload

* Use TaskReportFileWriter

* Checkstyle

* More PR comments
2018-04-02 12:10:56 -07:00
Kirill Kozlov 8878a7ff94 Replace guava Charsets with native java StandardCharsets (#5545) 2018-03-28 21:00:08 -07:00
Jihoon Son 1ad898bde2
Use the official aws-sdk instead of jet3t (#5382)
* Use the official aws-sdk instead of jet3t

* fix compile and serde tests

* address comments and fix test

* add http version string

* remove redundant dependencies, fix potential NPE, and fix test

* resolve TODOs

* fix build

* downgrade jackson version to 2.6.7

* fix test

* resolve the last TODO

* support proxy and endpoint configurations

* fix build

* remove debugging log

* downgrade hadoop version to 2.8.3

* fix tests

* remove unused log

* fix it test

* revert KerberosAuthenticator change

* change hadoop-aws scope to provided in hdfs-storage

* address comments

* address comments
2018-03-21 15:36:54 -07:00
Roman Leventov 693e3575f9
Remove unused code and exception declarations (#5461)
* Remove unused code and exception declarations

* Address comments

* Remove redundant Exception declarations

* Make FirehoseFactoryV2.connect() to throw IOException again
2018-03-16 22:11:12 +01:00
bolkedebruin 8f07a39af7 Skip OS cache on Linux when pulling segments (#5421)
Druid relies on the page cache of Linux in order to have memory segments.
However when loading segments from deep storage or rebalancing the page
cache can get poisoned by segments that should not be in memory yet.
This can significantly slow down Druid in case rebalancing happens
as data that might not be queried often is suddenly in the page cache.

This PR implements the same logic as is in Apache Cassandra and Apache
Bookkeeper.

Closes #4746
2018-03-08 07:54:21 -08:00
Gian Merlino 7e02408510 Update versions to 0.13.0-SNAPSHOT. (#5323) 2018-02-02 12:06:38 -06:00
Himanshu 632e44c539 use reflection to call hadoop fs.rename to workaround different hadoop jar version in main and hdfs-storage extension class loader (#5296)
* use reflection to call hadoop fs.rename to workaround different hadoop jar version in main and hdfs-storage extension class loader

* find rename method recursively
2018-01-29 10:26:53 -08:00
Jonathan Wei 80419752b5 Add metamx emitter, http clients, and metrics packages to druid java-util (#5289)
* Add metamx java-util emitter, http clients, and metrics packages to druid java-util

* Remove metamx java-util from pom.xml files

* Checkstyle fixes

* Import fix

* TeamCity inspection fixes

* Use slf4j, move some version defs to master pom.xml

* Use parent jvm-attach-api and maven-surefire-plugin versions

* Add ] to log msg, suppress inspection
2018-01-24 22:10:36 +01:00
Jihoon Son 5d0619f5ce Support retrying for PrefetchableTextFilesFirehoseFactory when prefetch is disabled (#5162)
* Add RetryingInputStream

* unnecessary exception

* fix PrefetchableTextFilesFirehoseFactoryTest

* Fix retrying on connection reset

* fix start offset

* fix checkstyle

* fix check connection reset

* address comments

* fix compile

* address comments

* address comments
2018-01-10 17:37:19 +01:00
David Lim a7967ade4d Support replaceExisting parameter for segments pushers (#5187)
* support replaceExisting parameter for segments pushers

* code review changes

* code review changes
2018-01-03 16:13:21 -08:00
Roman Leventov 5787d04fad Bump Druid version to 0.12.0 (#5138) 2017-12-15 07:37:01 -08:00
Roman Leventov 64848c7ebf DataSegment memory optimizations (#5094)
* Deduplicate DataSegments contents (loadSpec's keys, dimensions and metrics lists as a whole) more aggressively; use ArrayMap instead of default LinkedHashMap for DataSegment.loadSpec, because they have only 3 entries on average; prune DataSegment.loadSpec on brokers

* Fix DataSegmentTest

* Refinements

* Try to fix

* Fix the second DataSegmentTest

* Nullability

* Fix tests

* Fix tests, unify to use TestHelper.getJsonMapper()

* Revert TestUtil as ServerTestHelper, fix tests

* Add newline

* Fix indexing tests

* Fix s3 tests

* Try to fix tests, remove lazy caching of ObjectMapper in TestHelper, rename TestHelper.getJsonMapper() to makeJsonMapper()

* Fix HDFS tests

* Fix HdfsDataSegmentPusherTest

* Capitalize constant names
2017-12-12 11:41:40 -08:00
Roman Leventov 3541b7544b Prohibit and remove unused declarations in the processing module (#4930)
* Prohibit and remove unused declarations in the processing module

* Fix tests

* Fix integration tests

* Suppress unused

* Try to remove SuppressWarnings unused in VirtualColumn

* Remove reset 'false positives'

* Annotate CliCommandCreator as ExtensionPoint

* Unused import warning instead of error in IntelliJ

* Fixes

* Add comment

* Fix AzureBlob

* Fix CloudFilesBlob

* Address comments

* Add Project SDK section to INTELLIJ_SETUP.md

* Fix image
2017-11-09 09:27:27 -08:00
Roman Leventov 5eb08c27cb Add Emitter monitoring (#4973)
* Add Emitter monitoring

* Fix typo

* Fixes

* testing new emitter

* Fix failed test (#71)

* testing new emitter

* fix on failed test

* Remove emitter's readTimeout from docs

* Update docs

* Add HttpEmittingMonitor

* Update java-util to 1.3.2
2017-11-03 21:27:57 -06:00
Gian Merlino 1f2074c247 Bump versions in master to 0.11.1-SNAPSHOT. (#4878)
* Bump versions in master to 0.11.1-SNAPSHOT.

* Missed a few.
2017-09-28 17:09:51 -05:00
Roman Leventov e267f3901b Enforce Indentation with Checkstyle (#4799) 2017-09-21 13:06:48 -07:00
hzy001 4f61dc66a9 Remove the deprecated variable localChildren (#4357)
Signed-off-by: Hao Ziyu <haoziyu@qiyi.com>
2017-08-24 15:27:34 -05:00
Roman Leventov cbd1902db8 Add forbidden-apis plugin; prohibit using system time zone (#4611)
* Forbidden APIs WIP

* Remove some tests

* Restore io.druid.math.expr.Function

* Integration tests fix

* Add comments

* Fix in SimpleWorkerProvisioningStrategy

* Formatting

* Replace String.format() with StringUtils.format() in RemoteTaskRunnerTest

* Address comments

* Fix GroupByMultiSegmentTest
2017-08-21 13:02:42 -07:00
Roman Leventov aa7e4ae5e4 Enforce correct spacing with Checkstyle (#4651) 2017-08-05 10:18:25 -07:00
Roman Leventov c0beb78ffd Enforce brace formatting with Checkstyle (#4564) 2017-07-21 10:26:59 -05:00
Roman Leventov 9ae457f7ad Avoid using the default system Locale and printing to System.out in production code (#4409)
* Avoid usages of Default system Locale and printing to System.out or System.err in production code

* Fix Charset in DruidKerberosUtil

* Remove redundant string format in GenericIndexed

* Rename StringUtils.safeFormat() to unimportantSafeFormat(); add StringUtils.format() which fails as well as String.format()

* Fix testSafeFormat()

* More fixes of redundant StringUtils.format() inside ISE

* Rename unimportantSafeFormat() to nonStrictFormat()
2017-06-29 14:06:19 -07:00
Roman Leventov ae900a4934 Update versions to 0.11.0-SNAPSHOT (#4483) 2017-06-28 17:05:58 -07:00
Himanshu 61c38b66ad exclude aws-java-sdk from hadoop-aws dep in hdfs-storage module (#4437)
* exclude aws-java-sdk from hdfs-storage module

* address review comments
2017-06-22 15:56:35 -05:00
Roman Leventov 31d33b333e Make using implicit system Charset an error (#4326)
* Make using implicit system charset an error

* Use StringUtils.toUtf8() and fromUtf8() instead of String.getBytes() and new String()

* Use English locale in StringUtils.safeFormat()

* Restore comment
2017-06-05 23:57:25 -07:00
Slim a2584d214a Delagate creation of segmentPath/LoadSpec to DataSegmentPushers and add S3a support (#4116)
* Adding s3a schema and s3a implem to hdfs storage module.

* use 2.7.3

* use segment pusher to make loadspec

* move getStorageDir and makeLoad spec under DataSegmentPusher

* fix uts

* fix comment part1

* move to hadoop 2.8

* inject deep storage properties

* set version to 2.7.3

* fix build issue about static class

* fix comments

* fix default hadoop default coordinate

* fix create filesytem

* downgrade aws sdk

* bump the version
2017-06-04 00:55:09 -06:00
Benedict Jin e823085866 Improve `collection` related things that reusing a immutable object instead of creating a new object (#4135) 2017-05-17 01:38:51 +09:00
Akash Dwivedi a2419654ea Allow hadoop configurations using runtime properties. (#4189) 2017-04-26 00:05:27 +05:30
Gian Merlino 2ca7b00346 Update versions to 0.10.1-SNAPSHOT. (#4191) 2017-04-20 18:12:28 -07:00
Akash Dwivedi bebf9f34c7 HdfsDataSegmentPusher bug fix (#4003)
* Fix for HdfsDataSegmentPusher.

* Add missing loadspec in actual descriptor file. Tests to check actual content of descriptor file.
2017-03-06 00:53:44 -08:00
Akash Dwivedi 94da5e80f9 Namespace optimization for hdfs data segments. (#3877)
* NN optimization for hdfs data segments.

* HdfsDataSegmentKiller, HdfsDataSegment finder changes to use new storage
format.Docs update.

* Common utility function in DataSegmentPusherUtil.

* new static method `makeSegmentOutputPathUptoVersionForHdfs` in JobHelper

* reuse getHdfsStorageDirUptoVersion in
DataSegmentPusherUtil.getHdfsStorageDir()

* Addressed comments.

* Review comments.

* HdfsDataSegmentKiller requested changes.

* extra newline

* Add maprfs.
2017-03-01 09:51:20 -08:00
Gian Merlino f21641f0dc Fix over-optimistic log message. (#3963)
"Wrote task log" could be logged before the output stream is flushed and
closed, which could generate an error and not actually write the log.
2017-02-22 15:02:53 -08:00
Parag Jain edb032b96d add datasource in intermediate segment path (#3961) 2017-02-22 16:31:00 -06:00
Akash Dwivedi 8854ce018e File.deleteOnExit() (#3923)
* Less use of File.deleteOnExit()
 * removed deleteOnExit from most of the tests/benchmarks/iopeon
 * Made IOpeon closable

* Formatting.

* Revert DeterminePartitionsJobTest, remove cleanup method from IOPeon
2017-02-13 15:12:14 -08:00
Gian Merlino 12317fd001 Bump version to 0.10.0-SNAPSHOT. (#3913) 2017-02-06 17:54:35 -08:00
Akash Dwivedi e550d48772 Using fully qualified hdfs path. (#3705)
* Using fully qualified hdfs path.

* Review changes.

* Remove unused imports.

* Variable name change.
2017-01-17 14:40:22 -06:00
Jihoon Son d80bec83cc Enable auto license checking (#3836)
* Enable license checking

* Clean duplicated license headers
2017-01-10 18:13:47 -08:00
Himanshu 4ca3b7f1e4 overlord helpers framework and tasklog auto cleanup (#3677)
* overlord helpers framework and tasklog auto cleanup

* review comment changes

* further review comments addressed
2016-12-21 15:18:55 -08:00
Roman Leventov 7b56cec3b9 Fix resource leaks (#3702) 2016-11-18 21:21:36 +05:30
Himanshu b76b3f8d85 reset-cluster command to clean up druid state stored on metadata and deep storage (#3670) 2016-11-09 11:07:01 -06:00
Himanshu 2362effd8c use FileSystem.rename(from,to,Rename.NONE) so that tmp dirs from replicating tasks are not moved to the segment directory created by first task (#3650) 2016-11-02 15:58:55 -07:00
Himanshu eb70a12e43 fix cleanup of tmp dir in HdfsDataSegmentPusher (#3636) 2016-11-01 12:45:38 -05:00
Akash Dwivedi 4b3bd8bd63 Migrating java-util from Metamarkets. (#3585)
* Migrating java-util from Metamarkets.

* checkstyle and updated license on java-util files.

* Removed unused imports from whole project.

* cherry pick metamx/java-util@826021f.

* Copyright changes on java-util pom, address review comments.
2016-10-21 14:57:07 -07:00
Gian Merlino 0ce33bc95f HdfsDataSegmentPusher: Properly include scheme, host in output path if necessary. (#3577)
Fixes #3576.
2016-10-17 10:37:18 -04:00
Parag Jain c255dd8b19 fix datasegment metadata (#3555) 2016-10-07 16:30:33 -05:00
Parag Jain 76a60a007e create parent dir on HDFS if it does not exist (#3547) 2016-10-06 16:14:00 -07:00
Gian Merlino 40f2fe7893 Bump versions to 0.9.3-SNAPSHOT (#3524) 2016-09-29 13:53:32 -07:00
Nishant 6099d20303 [FIX] ReleaseException when the path is being written by multiple tasks (#3494)
* fix ReleaseException when the path is being written by multiple task

* Do not throw IOException if another replica wins the race for segment creation

fix if check

* handle logging comments

* fix test
2016-09-22 14:25:41 -05:00
Slim ba6ddf307e Adding hadoop kerberos authentification. (#3419)
* adding kerberos authentication

* make the 2 functions identical
2016-09-13 10:42:50 -07:00
Xavier Léauté 485e381387 remove datasource from hadoop output path (#3196)
fixes #2083, follow-up to #1702
2016-06-29 08:53:45 -07:00
Hyukjin Kwon 45f553fc28 Replace the deprecated usage of NoneShardSpec (#3166) 2016-06-25 10:27:25 -07:00
Gian Merlino ebf890fe79 Update master version to 0.9.2-SNAPSHOT. (#3133) 2016-06-13 13:10:38 -07:00
Charles Allen 6b957aa072 [QTL] Make URI Exctraction Namespace take more sane arguments (#2738)
* Make URI Exctraction Namespace take more sane arguments
* Fixes https://github.com/druid-io/druid/issues/2669

* Update docs

* Rename error message

* Undo overzealous deletion of docs

* Explain caching mechanism a bit more in docs
2016-05-02 12:54:34 -07:00
Gian Merlino 977e867ad8 Downgrade geoip2, exclude com.google.http-client.
Reverts "Update com.maxmind.geoip2 to 2.6.0" and exclude the google http client
from com.maxmind.geoip2. This should satisfy the original need from #2646 (wanting
to run Druid along with an upgraded com.google.http-client) while preventing
Jackson conflicts pointed out in #2717.

Fixes #2717.

This reverts commit 21b7572533.
2016-03-25 14:43:22 -07:00
Gian Merlino 7e7a886f65 Move druid-api into the druid repo.
This is from druid-api-0.3.17, as of commit 51884f1d05d5512cacaf62cedfbb28c6ab2535cf
in the druid-api repo.
2016-03-24 11:04:34 -07:00
Gian Merlino 738dcd8cd9 Update version to 0.9.1-SNAPSHOT.
Fixes #2462
2016-03-17 10:34:20 -07:00
Nishant ba1185963b Fix a bunch of dependencies
* Eliminate exclusion groups from pull-deps
* Only consider dependency nodes in pull-deps if they are not in the following scopes
	* provided
	* test
	* system
* Fix a bunch of `<scope>provided</scope>` missing tags
* Better exclusions for a couple of problematic libs
2016-03-10 10:18:08 -08:00
fjy e3e932a4d4 refactor extensions into core and contrib 2016-03-08 17:12:09 -08:00