Commit Graph

246 Commits

Author SHA1 Message Date
Charles Allen 056cab93ed Add Hadoop Converter Job and task
* Fixes https://github.com/druid-io/druid/issues/1363
* Add extra utils in JobHelper based on PR feedback
2015-06-09 14:47:38 -07:00
Charles Allen 2a76bdc60a Abstractify hadoopy indexer configuration.
* Moves many items to JobHelper
* Remove dependencies of these functions on HadoopDruidIndexerConfig in favor of more general items
* Changes functionalities of some of the path methods to always return a path with scheme
* Adds retry to uploads
* Change output loadSpec determining from using outputFS.getClass().getName() to using outputFS.getScheme()
2015-06-08 10:53:27 -07:00
fjy be2a35188e Additional schema validations and better logs for common extensions 2015-05-27 16:25:02 -07:00
Xavier Léauté 4466e77b25 Merge pull request #1371 from guobingkun/unit_test
Unit test for IndexGeneratorJob
2015-05-22 10:34:24 -04:00
flow 07659f30ab bug fix: hdfs task log and indexing task not work properly with Hadoop HA 2015-05-21 20:49:42 +08:00
Bingkun Guo b46aff12ae Unit test for IndexGeneratorJob 2015-05-18 12:31:16 -05:00
Fangjin Yang a2dc58cd2d Merge pull request #1345 from pjain1/unit_test_warn_fix
fix warn msg and some unit tests
2015-05-08 08:06:20 -07:00
Parag Jain 01448d264c Fix warn msg and added some unit tests 2015-05-07 17:10:05 -05:00
fjy b19435d172 fix typos with batch ingestion in docs 2015-05-07 14:46:17 -07:00
Bingkun Guo 1ee550dd91 Fix a potential issue in DeterminePartitionsJob by making HadoopDruidIndexerConfig non-static, and two unit tests for DeterminPartitionsJob and LocalDataSegmentKiller 2015-05-04 20:00:29 -07:00
Xavier Léauté 3a3046ccf3 add support for dimension compression
- compression for single-value dimensions using CompressedVSizeIntsIndexedSupplier
- makes dimension compression configurable via IndexSpec
- IndexSpec also enables configuring bitmap and metric compression
2015-04-14 10:44:18 -07:00
Prajwal Tuladhar 3044bf5592 use Job.getInstance() to fix deprecated warnings 2015-04-09 13:22:21 -04:00
Xavier Léauté 8b5fa8f85d always upload SNAPSHOT self-contained jars 2015-04-03 21:18:09 -07:00
Dia Kharrat 3a6dc99384 log invalid rows in mapper of Hadoop indexer 2015-03-19 22:31:04 -07:00
Dia Kharrat 58d5f5e7f0 Honor ignoreInvalidRows in Hadoop indexer
The reducer of the hadoop indexer now ignores lines with parsing
exceptions (if enabled by the indexer config).
2015-03-19 22:31:04 -07:00
Himanshu Gupta 8c1f0834ba Removing MapWritableInputRowParser from indexing-hadoop it should really be an extension if user needs 2015-03-19 18:37:08 -05:00
Himanshu Gupta 3f7a7ba5d3 For batch hadoop indexing, make hadoop input format configuration. Given input format must extend from org.apache.hadoop.mapreduce.InputFormat 2015-03-18 16:09:45 -05:00
fjy bfe10bd156 This fixes arbitrary gran spec breaking 2015-03-17 12:19:43 -07:00
Himanshu Gupta 6a0405de20 fail early if there is no input data for batch hadoop indexing 2015-03-07 12:45:57 -06:00
Himanshu Gupta 30f64ff19e UTs update for indexing-hadoop 2015-02-25 15:45:57 -08:00
Xavier Léauté 0784d7e30e Merge pull request #1152 from himanshug/metastorage-pwd-provider
support for metadata store PasswordProvider interface
2015-02-25 15:19:37 -08:00
Fangjin Yang 708f35151d Merge pull request #1121 from gianm/issue-1116
Use the proper FileSystems for writing segments and caching jars. (for issue #1116)
2015-02-25 13:03:59 -08:00
Fangjin Yang 6424815f88 Merge pull request #1097 from metamx/better-hadoop-sort-key
Sort HadoopIndexer rows by time+dim bucket to help reduce spilling
2015-02-25 12:49:58 -08:00
Himanshu Gupta 126262edce support for PasswordProvider interface to enable writing druid extension which can get metadata store password from secured location or anywhere instead of plain text properties file 2015-02-25 14:05:19 -06:00
Himanshu Gupta 01a4f19ea2 removing dependency on NativeS3FileSystem and other file systems 2015-02-23 14:27:50 -06:00
Gian Merlino fd5a7d1f08 Use the proper FileSystems for writing segments and caching jars. (for issue #1116) 2015-02-12 16:20:10 -08:00
Xavier Léauté b1ec7afc12 Sort HadoopIndexer rows by time+dim bucket to help reduce spilling 2015-02-10 14:26:28 -08:00
Fangjin Yang 92e616de11 Merge pull request #1077 from metamx/remove-unused-imports
remove unused imports
2015-02-02 10:45:27 -08:00
nishantmonu51 ba932bb1f2 remove unused imports 2015-02-02 21:53:39 +05:30
fjy d05032b98a towards a community led druid 2015-01-31 20:57:36 -08:00
Xavier Léauté cd9635ff5e Merge pull request #1034 from druid-io/minor-rename
minor rename of things in hadoop ingestion config to match 0.6.x
2015-01-15 15:46:13 -08:00
fjy ccddbf8747 minor rename of things in hadoop ingestion config to match 0.6.x 2015-01-15 14:04:55 -08:00
Fangjin Yang 5bfcc43377 Merge pull request #1008 from metamx/stringConversionJavaUtilUpdate
Update all String conversions to and from byte[] to use the java-util StringUtils functions
2015-01-15 13:50:27 -08:00
Fangjin Yang 852e863425 Merge pull request #981 from druid-io/strictModuleTyping
Use Module instead of generic Object in Guice related items
2015-01-05 12:43:20 -08:00
Charles Allen b1b5c9099e Update all String conversions to and from byte[] to use the java-util StringUtils functions
* Speedup of GroupBy with javaScript filters by ~10%
* Requires https://github.com/metamx/java-util/pull/15
2015-01-05 11:22:32 -08:00
Xavier Léauté f1375b0bfb workaround to pass down bitmap type to map-reduce tasks 2015-01-02 17:29:00 -08:00
Charles Allen 7c8d4a7433 Use Module instead of generic Object in Guice related items 2014-12-19 10:54:06 -08:00
fjy 43d27ddaf0 update http client and fix logging 2014-12-15 16:59:57 -08:00
fjy e872952390 fix working path default bug 2014-12-15 14:51:58 -08:00
fjy 28b72a69ad redocumenting ingestion 2014-12-08 16:15:46 -08:00
nishantmonu51 40f223215a fix buffer pool usage 2014-12-05 16:09:26 +05:30
nishantmonu51 6e03a6245f Merge branch 'master' into onheap-incremental-index 2014-12-05 10:40:28 +05:30
Xavier Léauté 7cd45a6e1f IncrementalIndex throws exception if limit exceeded
- For now uses a hardcoded ratio of aggregator to timeanddim buffer sizes
- canAppendRow is a workaround for realtime index since the
Firehose currently does not have a way of rolling back the last event in
case of error
- canAppendRow needs a fudge factor; there is a race between checking
if we can add a row and actually adding a row, because of the way MapDB
reports its size.
2014-12-04 14:38:16 -08:00
Gian Merlino 20a7239ffd Replace google-http-client imports with real guava imports. 2014-12-04 10:57:57 -08:00
Charles Allen c2add5730b Fix Hadoop CLI jobs
* Change "schema" --> "spec" for cli hadoop to keep up with internal hadoop
* Added check for HadoopDruidIndexerConfig deserialization from Map to see if it is trying to get a HadoopDruidIndexerConfig or a HadoopIngestionSpec
2014-12-04 10:57:56 -08:00
xvrl c867d59ee0 fix error message 2014-12-03 15:30:32 -08:00
Xavier Léauté 2e6c254937 metadata injection not needed for indexing service 2014-12-03 15:09:31 -08:00
Gian Merlino d388a8fe89 Replace google-http-client imports with real guava imports. 2014-12-03 10:52:57 -08:00
nishantmonu51 4dc0fdba8a consider mapped size in limit calculation & review comments 2014-12-03 23:47:30 +05:30
nishantmonu51 da8bd7836b Introduce buffer size 2014-12-03 16:28:22 +05:30
Charles Allen 7cd689be75 Fix Hadoop CLI jobs
* Change "schema" --> "spec" for cli hadoop to keep up with internal hadoop
* Added check for HadoopDruidIndexerConfig deserialization from Map to see if it is trying to get a HadoopDruidIndexerConfig or a HadoopIngestionSpec
2014-12-02 11:23:04 -08:00
nishantmonu51 eac776f1a7 tests passing with on heap incremental index 2014-12-02 22:29:28 +05:30
Xavier Léauté 59542c41f8 fix port not set in DruidNode 2014-12-01 14:37:28 -08:00
Charles Allen 8b3652a67a Modify HadoopDruidIndexerConfig to give a port of 0 instead of -1 when binding DruidNode @Self annotation 2014-12-01 14:08:41 -08:00
fjy fdeab0c6af make Druid case sensitive 2014-11-19 14:27:31 -08:00
nishantmonu51 f0452c5968 merge from master 2014-11-18 19:34:51 +05:30
nishantmonu51 edf0fc0851 Make hashed partitions spec default
- make hashed partitionsSpec as default partitions spec for 0.7
2014-11-17 19:48:12 +05:30
nishantmonu51 0c2d06475d merge from master 2014-11-17 19:19:18 +05:30
Xavier Léauté 0498df25df override metadata storage injection in CliHadoopIndexer 2014-11-07 13:44:56 -08:00
Xavier Léauté 50a191425c fix injection on MetadataStorageUpdaterJob 2014-11-07 11:11:14 -08:00
Xavier Léauté 20a9aef96a fix test 2014-11-06 17:27:05 -08:00
Xavier Léauté 9c06db021f rename db->metadata postgres->postgresql 2014-10-31 10:30:27 -07:00
jisookim0513 aa754b86e8 build success! 2014-10-24 11:28:42 -07:00
fjy bef74104d9 merge with 0.7.x and resolve any conflicts 2014-10-23 17:24:06 -07:00
fjy d76d57d95d update docs 2014-10-22 16:16:28 -07:00
jisookim0513 37979282fe enabled ansi-quote in mysql; insert statement should now work 2014-10-21 00:09:19 -07:00
jisookim0513 7d5c5f2083 fixed createTable; fixed miscellaneous stuff; added DerbyMetadataRuleManagerProvider 2014-10-17 00:10:36 -07:00
nishantmonu51 41e88baeca Add test for bucket selection 2014-10-15 23:09:28 +05:30
nishantmonu51 f4a97aebbc fix rollup for hashed partitions
truncate timestamp while calculating the partitionNumber
2014-10-15 22:32:56 +05:30
nishantmonu51 b5d66381f3 more cleanup 2014-10-14 18:32:40 +05:30
nishantmonu51 454acd3f5a remove backwards compatible code
1) remove backwards compatible and deprecated code
2) make hashed partitions spec default
2014-10-13 19:30:44 +05:30
fjy c7b4d5b7b4 Merge branch 'master' into druid-0.7.x
Conflicts:
	processing/src/test/java/io/druid/segment/filter/SpatialFilterTest.java
2014-10-02 18:12:10 -07:00
nishantmonu51 ad75a21040 separate offheapIncrementalIndex implementation 2014-10-01 13:58:51 +05:30
jisookim0513 9d7b5d9b0f fixed javadoc; fixed pom files; deleted unnecessary class 2014-09-30 13:47:35 -07:00
nishantmonu51 358ff915bb fix merge conflicts 2014-09-30 22:19:18 +05:30
nishantmonu51 2789536bed merge changes from druid-0.7.x 2014-09-30 22:05:49 +05:30
nishantmonu51 61c7fd2e6e make ingestOffheap tuneable 2014-09-30 15:30:02 +05:30
nishantmonu51 adb4a65e0a Merge branch 'offheap-incremental-index' into mapdb-branch 2014-09-29 12:38:31 +05:30
jisookim0513 74565c9371 cleaned up the code 2014-09-27 13:10:01 -07:00
jisookim0513 aa887edb73 added two seperate modules for mysql and postgres 2014-09-27 13:08:53 -07:00
flow 2dd62979bb Fixed the issue of batch ingestion with indexing service to hdfs end up with the path of metadata in mysql missing "hdfs://host" prefix. The detail describe can be found here: https://groups.google.com/forum/#!topic/druid-development/ofvSxiPpCxI 2014-09-27 22:26:52 +08:00
jisookim0513 6a641621b2 finished merging into druid-0.7.x; derby not working (to be fixed) 2014-09-26 14:24:53 -07:00
jisookim0513 43cc6283d3 trying to revert files that have overwritten changes 2014-09-26 12:38:04 -07:00
fjy eaf0a48b92 Merge branch 'master' into druid-0.7.x
Conflicts:
	cassandra-storage/pom.xml
	common/pom.xml
	examples/pom.xml
	hdfs-storage/pom.xml
	histogram/pom.xml
	indexing-hadoop/pom.xml
	indexing-service/pom.xml
	kafka-eight/pom.xml
	kafka-seven/pom.xml
	pom.xml
	processing/pom.xml
	processing/src/main/java/io/druid/guice/PropertiesModule.java
	rabbitmq/pom.xml
	s3-extensions/pom.xml
	server/pom.xml
	services/pom.xml
2014-09-26 11:39:24 -07:00
jisookim0513 3bf39cc9f8 attempted to fix merge-conflicts 2014-09-24 15:55:42 -07:00
nishantmonu51 f51ab84386 merge changes from druid-0.7.x 2014-09-22 23:48:45 +05:30
nishantmonu51 443e5788fb make OffheapIncrementalIndex tuneable 2014-09-22 19:26:10 +05:30
jisookim0513 273205f217 initial attempt for abstraction; druid cluster works with Derby as a default 2014-09-19 17:39:59 -07:00
nishantmonu51 8eb6466487 revert buffer size and add back rowFlushBoundary 2014-09-19 23:06:04 +05:30
Xavier Léauté d501b052ea remove unused columnConfig 2014-09-15 13:02:47 -07:00
Xavier Léauté e57e2d97ba make constants final 2014-09-15 12:53:40 -07:00
fjy 469ccbbe5e Merge branch 'master' into druid-0.7.x
Conflicts:
	cassandra-storage/pom.xml
	common/pom.xml
	examples/pom.xml
	hdfs-storage/pom.xml
	histogram/pom.xml
	indexing-hadoop/pom.xml
	indexing-service/pom.xml
	kafka-eight/pom.xml
	kafka-seven/pom.xml
	pom.xml
	processing/pom.xml
	processing/src/main/java/io/druid/query/FinalizeResultsQueryRunner.java
	processing/src/main/java/io/druid/query/UnionQueryRunner.java
	processing/src/main/java/io/druid/query/groupby/GroupByQueryRunnerFactory.java
	processing/src/main/java/io/druid/query/topn/TopNQueryEngine.java
	processing/src/main/java/io/druid/query/topn/TopNQueryRunnerFactory.java
	rabbitmq/pom.xml
	s3-extensions/pom.xml
	server/pom.xml
	server/src/test/java/io/druid/server/initialization/JettyTest.java
	services/pom.xml
2014-09-11 16:20:50 -07:00
fjy fec7b43fcb make making v9 segments something completely configurable 2014-09-10 15:28:30 -07:00
fjy 351afb8be7 allow legacy index generator 2014-09-09 17:04:35 -07:00
Xavier Léauté 58ab759fc6 remove unused imports 2014-08-29 14:03:47 -07:00
Xavier Léauté ac05836833 make Java 8 javadoc happy 2014-08-29 13:58:50 -07:00
fjy 12f4147df5 switch index gen job to use logging indicator 2014-08-21 13:28:15 -07:00
fjy d64879ccca more cleanup 2014-08-20 13:22:42 -07:00
fjy bb73b2556e fix compilation 2014-08-20 13:17:00 -07:00
fjy 92f26d9a1f cleanup rowflushboundary 2014-08-20 13:09:37 -07:00