Commit Graph

306 Commits

Author SHA1 Message Date
Himanshu Gupta 0eec1bbee2 json serde tests for HadoopTuningConfig 2015-07-20 12:01:53 -05:00
Himanshu Gupta f836c3a7ac adding flag useCombiner to hadoop tuning config that can be used to add a
hadoop combiner to hadoop batch ingestion to do merges on the mappers if possible
2015-07-20 12:01:53 -05:00
Himanshu Gupta 4ef484048a take control of InputRow serde between Mapper/Reducer in Hadoop Indexing
This allows for arbitrary InputFormat while hadoop batch ingestion that
can return records of value type other than Text
2015-07-20 12:01:53 -05:00
Himanshu Gupta f7a92db332 generic byte[] serde for InputRow 2015-07-20 12:01:53 -05:00
Charles Allen b2bc46be17 Merge pull request #1484 from tubemogul/feature/1463
JobHelper.ensurePaths will set job properties from config (tuningConf…
2015-07-07 10:58:16 -07:00
Michael Schiff 6ad451a44a JobHelper.ensurePaths will set job properties from config (tuningConfig.jobProperties) before adding input paths to the config.
Adding input paths will create Path and FileSystem instances which may depend on the values in the job config.
This allows all properties to be set from the spec file, avoiding having to directly edit cluster xml files.

IndexGeneratorJob.run adds job properties before adding input paths (adding input paths may depend on having job properies set)

JobHelperTest confirms that JobHelper.ensurePaths adds job properties

javadoc for addInputPaths to explain relationship with
addJobProperties
2015-07-01 12:45:32 -07:00
Davide Anastasia 4a3a7dd1ad read hadoop-indexer configuration file from HDFS 2015-06-24 14:08:53 -07:00
Hao Xia 1931491c9f A couple of hdfs related fixes
* Class loading issue with hdfs-storage extension
* Exception when using hdfs with non-fully qualified segment path
2015-06-19 17:22:20 -07:00
Charles Allen 94a567732a Wipe FileContext off the face of the earth
* Fixes https://github.com/druid-io/druid/issues/1433
* Works arround https://issues.apache.org/jira/browse/HADOOP-10643
* Reverts to the prior method of renaming
2015-06-16 09:48:09 -07:00
Charles Allen 6230ac90ae Use IndexMerger for conversion 2015-06-10 11:34:58 -07:00
Charles Allen 056cab93ed Add Hadoop Converter Job and task
* Fixes https://github.com/druid-io/druid/issues/1363
* Add extra utils in JobHelper based on PR feedback
2015-06-09 14:47:38 -07:00
Charles Allen 2a76bdc60a Abstractify hadoopy indexer configuration.
* Moves many items to JobHelper
* Remove dependencies of these functions on HadoopDruidIndexerConfig in favor of more general items
* Changes functionalities of some of the path methods to always return a path with scheme
* Adds retry to uploads
* Change output loadSpec determining from using outputFS.getClass().getName() to using outputFS.getScheme()
2015-06-08 10:53:27 -07:00
fjy be2a35188e Additional schema validations and better logs for common extensions 2015-05-27 16:25:02 -07:00
Xavier Léauté 4466e77b25 Merge pull request #1371 from guobingkun/unit_test
Unit test for IndexGeneratorJob
2015-05-22 10:34:24 -04:00
flow 07659f30ab bug fix: hdfs task log and indexing task not work properly with Hadoop HA 2015-05-21 20:49:42 +08:00
Bingkun Guo b46aff12ae Unit test for IndexGeneratorJob 2015-05-18 12:31:16 -05:00
Fangjin Yang a2dc58cd2d Merge pull request #1345 from pjain1/unit_test_warn_fix
fix warn msg and some unit tests
2015-05-08 08:06:20 -07:00
Parag Jain 01448d264c Fix warn msg and added some unit tests 2015-05-07 17:10:05 -05:00
fjy b19435d172 fix typos with batch ingestion in docs 2015-05-07 14:46:17 -07:00
Bingkun Guo 1ee550dd91 Fix a potential issue in DeterminePartitionsJob by making HadoopDruidIndexerConfig non-static, and two unit tests for DeterminPartitionsJob and LocalDataSegmentKiller 2015-05-04 20:00:29 -07:00
Xavier Léauté 3a3046ccf3 add support for dimension compression
- compression for single-value dimensions using CompressedVSizeIntsIndexedSupplier
- makes dimension compression configurable via IndexSpec
- IndexSpec also enables configuring bitmap and metric compression
2015-04-14 10:44:18 -07:00
Prajwal Tuladhar 3044bf5592 use Job.getInstance() to fix deprecated warnings 2015-04-09 13:22:21 -04:00
Xavier Léauté 8b5fa8f85d always upload SNAPSHOT self-contained jars 2015-04-03 21:18:09 -07:00
Dia Kharrat 3a6dc99384 log invalid rows in mapper of Hadoop indexer 2015-03-19 22:31:04 -07:00
Dia Kharrat 58d5f5e7f0 Honor ignoreInvalidRows in Hadoop indexer
The reducer of the hadoop indexer now ignores lines with parsing
exceptions (if enabled by the indexer config).
2015-03-19 22:31:04 -07:00
Himanshu Gupta 8c1f0834ba Removing MapWritableInputRowParser from indexing-hadoop it should really be an extension if user needs 2015-03-19 18:37:08 -05:00
Himanshu Gupta 3f7a7ba5d3 For batch hadoop indexing, make hadoop input format configuration. Given input format must extend from org.apache.hadoop.mapreduce.InputFormat 2015-03-18 16:09:45 -05:00
fjy bfe10bd156 This fixes arbitrary gran spec breaking 2015-03-17 12:19:43 -07:00
Himanshu Gupta 6a0405de20 fail early if there is no input data for batch hadoop indexing 2015-03-07 12:45:57 -06:00
Himanshu Gupta 30f64ff19e UTs update for indexing-hadoop 2015-02-25 15:45:57 -08:00
Xavier Léauté 0784d7e30e Merge pull request #1152 from himanshug/metastorage-pwd-provider
support for metadata store PasswordProvider interface
2015-02-25 15:19:37 -08:00
Fangjin Yang 708f35151d Merge pull request #1121 from gianm/issue-1116
Use the proper FileSystems for writing segments and caching jars. (for issue #1116)
2015-02-25 13:03:59 -08:00
Fangjin Yang 6424815f88 Merge pull request #1097 from metamx/better-hadoop-sort-key
Sort HadoopIndexer rows by time+dim bucket to help reduce spilling
2015-02-25 12:49:58 -08:00
Himanshu Gupta 126262edce support for PasswordProvider interface to enable writing druid extension which can get metadata store password from secured location or anywhere instead of plain text properties file 2015-02-25 14:05:19 -06:00
Himanshu Gupta 01a4f19ea2 removing dependency on NativeS3FileSystem and other file systems 2015-02-23 14:27:50 -06:00
Gian Merlino fd5a7d1f08 Use the proper FileSystems for writing segments and caching jars. (for issue #1116) 2015-02-12 16:20:10 -08:00
Xavier Léauté b1ec7afc12 Sort HadoopIndexer rows by time+dim bucket to help reduce spilling 2015-02-10 14:26:28 -08:00
Fangjin Yang 92e616de11 Merge pull request #1077 from metamx/remove-unused-imports
remove unused imports
2015-02-02 10:45:27 -08:00
nishantmonu51 ba932bb1f2 remove unused imports 2015-02-02 21:53:39 +05:30
fjy d05032b98a towards a community led druid 2015-01-31 20:57:36 -08:00
Xavier Léauté cd9635ff5e Merge pull request #1034 from druid-io/minor-rename
minor rename of things in hadoop ingestion config to match 0.6.x
2015-01-15 15:46:13 -08:00
fjy ccddbf8747 minor rename of things in hadoop ingestion config to match 0.6.x 2015-01-15 14:04:55 -08:00
Fangjin Yang 5bfcc43377 Merge pull request #1008 from metamx/stringConversionJavaUtilUpdate
Update all String conversions to and from byte[] to use the java-util StringUtils functions
2015-01-15 13:50:27 -08:00
Fangjin Yang 852e863425 Merge pull request #981 from druid-io/strictModuleTyping
Use Module instead of generic Object in Guice related items
2015-01-05 12:43:20 -08:00
Charles Allen b1b5c9099e Update all String conversions to and from byte[] to use the java-util StringUtils functions
* Speedup of GroupBy with javaScript filters by ~10%
* Requires https://github.com/metamx/java-util/pull/15
2015-01-05 11:22:32 -08:00
Xavier Léauté f1375b0bfb workaround to pass down bitmap type to map-reduce tasks 2015-01-02 17:29:00 -08:00
Charles Allen 7c8d4a7433 Use Module instead of generic Object in Guice related items 2014-12-19 10:54:06 -08:00
fjy 43d27ddaf0 update http client and fix logging 2014-12-15 16:59:57 -08:00
fjy e872952390 fix working path default bug 2014-12-15 14:51:58 -08:00
fjy 28b72a69ad redocumenting ingestion 2014-12-08 16:15:46 -08:00
nishantmonu51 40f223215a fix buffer pool usage 2014-12-05 16:09:26 +05:30
nishantmonu51 6e03a6245f Merge branch 'master' into onheap-incremental-index 2014-12-05 10:40:28 +05:30
Xavier Léauté 7cd45a6e1f IncrementalIndex throws exception if limit exceeded
- For now uses a hardcoded ratio of aggregator to timeanddim buffer sizes
- canAppendRow is a workaround for realtime index since the
Firehose currently does not have a way of rolling back the last event in
case of error
- canAppendRow needs a fudge factor; there is a race between checking
if we can add a row and actually adding a row, because of the way MapDB
reports its size.
2014-12-04 14:38:16 -08:00
Gian Merlino 20a7239ffd Replace google-http-client imports with real guava imports. 2014-12-04 10:57:57 -08:00
Charles Allen c2add5730b Fix Hadoop CLI jobs
* Change "schema" --> "spec" for cli hadoop to keep up with internal hadoop
* Added check for HadoopDruidIndexerConfig deserialization from Map to see if it is trying to get a HadoopDruidIndexerConfig or a HadoopIngestionSpec
2014-12-04 10:57:56 -08:00
xvrl c867d59ee0 fix error message 2014-12-03 15:30:32 -08:00
Xavier Léauté 2e6c254937 metadata injection not needed for indexing service 2014-12-03 15:09:31 -08:00
Gian Merlino d388a8fe89 Replace google-http-client imports with real guava imports. 2014-12-03 10:52:57 -08:00
nishantmonu51 4dc0fdba8a consider mapped size in limit calculation & review comments 2014-12-03 23:47:30 +05:30
nishantmonu51 da8bd7836b Introduce buffer size 2014-12-03 16:28:22 +05:30
Charles Allen 7cd689be75 Fix Hadoop CLI jobs
* Change "schema" --> "spec" for cli hadoop to keep up with internal hadoop
* Added check for HadoopDruidIndexerConfig deserialization from Map to see if it is trying to get a HadoopDruidIndexerConfig or a HadoopIngestionSpec
2014-12-02 11:23:04 -08:00
nishantmonu51 eac776f1a7 tests passing with on heap incremental index 2014-12-02 22:29:28 +05:30
Xavier Léauté 59542c41f8 fix port not set in DruidNode 2014-12-01 14:37:28 -08:00
Charles Allen 8b3652a67a Modify HadoopDruidIndexerConfig to give a port of 0 instead of -1 when binding DruidNode @Self annotation 2014-12-01 14:08:41 -08:00
fjy fdeab0c6af make Druid case sensitive 2014-11-19 14:27:31 -08:00
nishantmonu51 f0452c5968 merge from master 2014-11-18 19:34:51 +05:30
nishantmonu51 edf0fc0851 Make hashed partitions spec default
- make hashed partitionsSpec as default partitions spec for 0.7
2014-11-17 19:48:12 +05:30
nishantmonu51 0c2d06475d merge from master 2014-11-17 19:19:18 +05:30
Xavier Léauté 0498df25df override metadata storage injection in CliHadoopIndexer 2014-11-07 13:44:56 -08:00
Xavier Léauté 50a191425c fix injection on MetadataStorageUpdaterJob 2014-11-07 11:11:14 -08:00
Xavier Léauté 20a9aef96a fix test 2014-11-06 17:27:05 -08:00
Xavier Léauté 9c06db021f rename db->metadata postgres->postgresql 2014-10-31 10:30:27 -07:00
jisookim0513 aa754b86e8 build success! 2014-10-24 11:28:42 -07:00
fjy bef74104d9 merge with 0.7.x and resolve any conflicts 2014-10-23 17:24:06 -07:00
fjy d76d57d95d update docs 2014-10-22 16:16:28 -07:00
jisookim0513 37979282fe enabled ansi-quote in mysql; insert statement should now work 2014-10-21 00:09:19 -07:00
jisookim0513 7d5c5f2083 fixed createTable; fixed miscellaneous stuff; added DerbyMetadataRuleManagerProvider 2014-10-17 00:10:36 -07:00
nishantmonu51 41e88baeca Add test for bucket selection 2014-10-15 23:09:28 +05:30
nishantmonu51 f4a97aebbc fix rollup for hashed partitions
truncate timestamp while calculating the partitionNumber
2014-10-15 22:32:56 +05:30
nishantmonu51 b5d66381f3 more cleanup 2014-10-14 18:32:40 +05:30
nishantmonu51 454acd3f5a remove backwards compatible code
1) remove backwards compatible and deprecated code
2) make hashed partitions spec default
2014-10-13 19:30:44 +05:30
fjy c7b4d5b7b4 Merge branch 'master' into druid-0.7.x
Conflicts:
	processing/src/test/java/io/druid/segment/filter/SpatialFilterTest.java
2014-10-02 18:12:10 -07:00
nishantmonu51 ad75a21040 separate offheapIncrementalIndex implementation 2014-10-01 13:58:51 +05:30
jisookim0513 9d7b5d9b0f fixed javadoc; fixed pom files; deleted unnecessary class 2014-09-30 13:47:35 -07:00
nishantmonu51 358ff915bb fix merge conflicts 2014-09-30 22:19:18 +05:30
nishantmonu51 2789536bed merge changes from druid-0.7.x 2014-09-30 22:05:49 +05:30
nishantmonu51 61c7fd2e6e make ingestOffheap tuneable 2014-09-30 15:30:02 +05:30
nishantmonu51 adb4a65e0a Merge branch 'offheap-incremental-index' into mapdb-branch 2014-09-29 12:38:31 +05:30
jisookim0513 74565c9371 cleaned up the code 2014-09-27 13:10:01 -07:00
jisookim0513 aa887edb73 added two seperate modules for mysql and postgres 2014-09-27 13:08:53 -07:00
flow 2dd62979bb Fixed the issue of batch ingestion with indexing service to hdfs end up with the path of metadata in mysql missing "hdfs://host" prefix. The detail describe can be found here: https://groups.google.com/forum/#!topic/druid-development/ofvSxiPpCxI 2014-09-27 22:26:52 +08:00
jisookim0513 6a641621b2 finished merging into druid-0.7.x; derby not working (to be fixed) 2014-09-26 14:24:53 -07:00
jisookim0513 43cc6283d3 trying to revert files that have overwritten changes 2014-09-26 12:38:04 -07:00
fjy eaf0a48b92 Merge branch 'master' into druid-0.7.x
Conflicts:
	cassandra-storage/pom.xml
	common/pom.xml
	examples/pom.xml
	hdfs-storage/pom.xml
	histogram/pom.xml
	indexing-hadoop/pom.xml
	indexing-service/pom.xml
	kafka-eight/pom.xml
	kafka-seven/pom.xml
	pom.xml
	processing/pom.xml
	processing/src/main/java/io/druid/guice/PropertiesModule.java
	rabbitmq/pom.xml
	s3-extensions/pom.xml
	server/pom.xml
	services/pom.xml
2014-09-26 11:39:24 -07:00
jisookim0513 3bf39cc9f8 attempted to fix merge-conflicts 2014-09-24 15:55:42 -07:00
nishantmonu51 f51ab84386 merge changes from druid-0.7.x 2014-09-22 23:48:45 +05:30
nishantmonu51 443e5788fb make OffheapIncrementalIndex tuneable 2014-09-22 19:26:10 +05:30
jisookim0513 273205f217 initial attempt for abstraction; druid cluster works with Derby as a default 2014-09-19 17:39:59 -07:00
nishantmonu51 8eb6466487 revert buffer size and add back rowFlushBoundary 2014-09-19 23:06:04 +05:30
Xavier Léauté d501b052ea remove unused columnConfig 2014-09-15 13:02:47 -07:00
Xavier Léauté e57e2d97ba make constants final 2014-09-15 12:53:40 -07:00
fjy 469ccbbe5e Merge branch 'master' into druid-0.7.x
Conflicts:
	cassandra-storage/pom.xml
	common/pom.xml
	examples/pom.xml
	hdfs-storage/pom.xml
	histogram/pom.xml
	indexing-hadoop/pom.xml
	indexing-service/pom.xml
	kafka-eight/pom.xml
	kafka-seven/pom.xml
	pom.xml
	processing/pom.xml
	processing/src/main/java/io/druid/query/FinalizeResultsQueryRunner.java
	processing/src/main/java/io/druid/query/UnionQueryRunner.java
	processing/src/main/java/io/druid/query/groupby/GroupByQueryRunnerFactory.java
	processing/src/main/java/io/druid/query/topn/TopNQueryEngine.java
	processing/src/main/java/io/druid/query/topn/TopNQueryRunnerFactory.java
	rabbitmq/pom.xml
	s3-extensions/pom.xml
	server/pom.xml
	server/src/test/java/io/druid/server/initialization/JettyTest.java
	services/pom.xml
2014-09-11 16:20:50 -07:00
fjy fec7b43fcb make making v9 segments something completely configurable 2014-09-10 15:28:30 -07:00
fjy 351afb8be7 allow legacy index generator 2014-09-09 17:04:35 -07:00
Xavier Léauté 58ab759fc6 remove unused imports 2014-08-29 14:03:47 -07:00
Xavier Léauté ac05836833 make Java 8 javadoc happy 2014-08-29 13:58:50 -07:00
fjy 12f4147df5 switch index gen job to use logging indicator 2014-08-21 13:28:15 -07:00
fjy d64879ccca more cleanup 2014-08-20 13:22:42 -07:00
fjy bb73b2556e fix compilation 2014-08-20 13:17:00 -07:00
fjy 92f26d9a1f cleanup rowflushboundary 2014-08-20 13:09:37 -07:00
nishantmonu51 79ff993b31 increase default buffer size to 512m 2014-08-20 22:15:06 +05:30
nishantmonu51 33354cf7fe replace maxRowsInMemory with BufferSize 2014-08-20 20:59:44 +05:30
fjy 88a904e0b3 address cr about progress ind 2014-08-19 12:59:01 -07:00
nishantmonu51 c6712739dc merge changes from druid-0.7.x 2014-08-12 15:47:42 +05:30
nishantmonu51 9598a524a8 review comment - move index closure to finally 2014-08-12 14:58:55 +05:30
nishantmonu51 637bd35785 merge changes from druid-0.7.x 2014-07-31 16:07:22 +05:30
nishantmonu51 4ce12470a1 Add way to skip determine partitions for index task
Add a way to skip determinePartitions for IndexTask by manually
specifying numShards.
2014-07-18 18:52:15 +05:30
nishantmonu51 f5f05e3a9b Sync changes from branch new-ingestion PR #599
Sync and Resolve Conflicts
2014-07-11 16:15:10 +05:30
nishantmonu51 fa43049240 review comments & pom changes 2014-07-10 11:48:46 +05:30
nishantmonu51 36fc85736c Add ShardSpec Lookup
Optimize choosing shardSpec for Hash Partitions
2014-07-08 18:01:31 +05:30
fjy 4c40e71e54 address cr 2014-06-19 14:48:46 -07:00
fjy a870fe5cbe inject column config 2014-06-19 14:47:57 -07:00
Xavier Léauté 09346b0a3c make column cache configurable 2014-06-19 14:43:03 -07:00
fjy a63cda3281 Merge branch 'master' into new-guava
Conflicts:
	server/src/main/java/io/druid/server/QueryResource.java
2014-06-13 10:08:10 -07:00
nishantmonu51 a7e19ad892 configure buffer sizes 2014-06-12 19:32:37 +05:30
nishantmonu51 6265613bb9 Merge branch 'master' into offheap-incremental-index 2014-06-05 17:42:57 +05:30
nishantmonu51 01e8a713b6 unit tests passing with offheap-indexing 2014-06-05 17:42:53 +05:30
Gian Merlino 1ca7bf03b8 IndexGeneratorJob needs to respect isCombineText, too. 2014-06-04 17:54:31 -07:00
fjy adc00f2bcf make combine text configurable 2014-06-04 16:24:56 -07:00
fjy bb4105ed1a fix broken standalone hadoop ingestion 2014-06-04 09:23:46 -07:00
fjy 77ec4df797 update guava, java-util, and druid-api 2014-06-03 13:43:38 -07:00
fjy 4c13327297 more logging for determine hashed 2014-05-30 16:19:20 -07:00
fjy 7be93a770a make all firehoses work with tasks, add a lot more documentation about configuration 2014-05-28 16:33:59 -07:00
Deepak 7d92cf2b3b Update IndexGeneratorJob.java
CombineTextInputFormat instead of TextInputFormat combines multiple splits for a single mapper and reduces the strain on hadoop platform. It greatly improves job completion time as there are fewer number of mappers to bookkeep.
2014-05-22 15:08:12 +05:30
Deepak de0a7b27e7 Update DetermineHashedPartitionsJob.java
CombineTextInputFormat instead of TextInputFormat combines multiple splits for a single mapper and reduces the strain on hadoop platform. It greatly improves job completion time as there are fewer number of mappers to bookkeep.
2014-05-22 15:06:56 +05:30
Xavier Léauté 9ec7c71e0f fix compilation error with updated druid-api 2014-05-19 14:06:23 -07:00
fjy 1100d2f2a1 rename configs to make a bit more sense 2014-05-06 14:52:50 -07:00
fjy b6fb4245aa Merge branch 'master' into new-schema
Conflicts:
	indexing-hadoop/src/main/java/io/druid/indexer/HadoopDriverConfig.java
	indexing-hadoop/src/main/java/io/druid/indexer/HadoopDruidIndexerConfig.java
	indexing-hadoop/src/main/java/io/druid/indexer/HadoopDruidIndexerConfigBuilder.java
	pom.xml
	server/src/main/java/io/druid/segment/realtime/RealtimeManager.java
	server/src/main/java/io/druid/segment/realtime/firehose/EventReceiverFirehoseFactory.java
2014-05-06 14:32:51 -07:00
Gian Merlino bdf9e74a3b Allow config-based overriding of hadoop job properties. 2014-05-06 09:11:31 -07:00
fjy f9523274ac remove extra println 2014-05-01 15:06:51 -07:00
nishantmonu51 5137031304 use same logic for compression
Use same logic for compression across creating files, reading from
files, and checking file existence
2014-05-01 15:20:47 +05:30
nishantmonu51 728f1e8ee3 fix exists check with compression 2014-05-01 15:01:10 +05:30
nishantmonu51 01e84f10b7 add the checks again.
removing these checks breaks when there is no data for any interval
2014-05-01 14:35:09 +05:30
fjy 76e0a48527 Merge branch 'master' into new-schema
Conflicts:
	indexing-hadoop/src/main/java/io/druid/indexer/DbUpdaterJob.java
	indexing-hadoop/src/test/java/io/druid/indexer/HadoopDruidIndexerConfigTest.java
	indexing-service/src/main/java/io/druid/indexing/common/task/HadoopIndexTask.java
	server/src/main/java/io/druid/segment/realtime/plumber/RealtimePlumber.java
	server/src/main/java/io/druid/segment/realtime/plumber/RealtimePlumberSchool.java
2014-04-25 14:03:28 -07:00
fjy 2d1f33e59f Merge pull request #500 from metamx/batch-ingestion-fixes
Batch ingestion fixes
2014-04-22 17:59:24 -06:00
nishantmonu51 357bbf5127 add all the shard specs 2014-04-23 05:23:11 +05:30
nishantmonu51 625a5418d2 minor fix 2014-04-23 05:05:51 +05:30
nishantmonu51 1ca61237c1 review comments- use final variables 2014-04-23 03:33:28 +05:30
nishantmonu51 0d8c1ffe54 review comments and add partitioner 2014-04-23 03:30:30 +05:30
nishantmonu51 ea4a80e8d2 Add serde test for shardCount 2014-04-23 00:24:08 +05:30