Commit Graph

278 Commits

Author SHA1 Message Date
fjy adc00f2bcf make combine text configurable 2014-06-04 16:24:56 -07:00
fjy bb4105ed1a fix broken standalone hadoop ingestion 2014-06-04 09:23:46 -07:00
fjy 77ec4df797 update guava, java-util, and druid-api 2014-06-03 13:43:38 -07:00
fjy 4c13327297 more logging for determine hashed 2014-05-30 16:19:20 -07:00
fjy 7be93a770a make all firehoses work with tasks, add a lot more documentation about configuration 2014-05-28 16:33:59 -07:00
Deepak 7d92cf2b3b Update IndexGeneratorJob.java
CombineTextInputFormat instead of TextInputFormat combines multiple splits for a single mapper and reduces the strain on hadoop platform. It greatly improves job completion time as there are fewer number of mappers to bookkeep.
2014-05-22 15:08:12 +05:30
Deepak de0a7b27e7 Update DetermineHashedPartitionsJob.java
CombineTextInputFormat instead of TextInputFormat combines multiple splits for a single mapper and reduces the strain on hadoop platform. It greatly improves job completion time as there are fewer number of mappers to bookkeep.
2014-05-22 15:06:56 +05:30
Xavier Léauté 9ec7c71e0f fix compilation error with updated druid-api 2014-05-19 14:06:23 -07:00
fjy 1100d2f2a1 rename configs to make a bit more sense 2014-05-06 14:52:50 -07:00
fjy b6fb4245aa Merge branch 'master' into new-schema
Conflicts:
	indexing-hadoop/src/main/java/io/druid/indexer/HadoopDriverConfig.java
	indexing-hadoop/src/main/java/io/druid/indexer/HadoopDruidIndexerConfig.java
	indexing-hadoop/src/main/java/io/druid/indexer/HadoopDruidIndexerConfigBuilder.java
	pom.xml
	server/src/main/java/io/druid/segment/realtime/RealtimeManager.java
	server/src/main/java/io/druid/segment/realtime/firehose/EventReceiverFirehoseFactory.java
2014-05-06 14:32:51 -07:00
Gian Merlino bdf9e74a3b Allow config-based overriding of hadoop job properties. 2014-05-06 09:11:31 -07:00
fjy f9523274ac remove extra println 2014-05-01 15:06:51 -07:00
nishantmonu51 5137031304 use same logic for compression
Use same logic for compression across creating files, reading from
files, and checking file existence
2014-05-01 15:20:47 +05:30
nishantmonu51 728f1e8ee3 fix exists check with compression 2014-05-01 15:01:10 +05:30
nishantmonu51 01e84f10b7 add the checks again.
removing these checks breaks when there is no data for any interval
2014-05-01 14:35:09 +05:30
fjy 76e0a48527 Merge branch 'master' into new-schema
Conflicts:
	indexing-hadoop/src/main/java/io/druid/indexer/DbUpdaterJob.java
	indexing-hadoop/src/test/java/io/druid/indexer/HadoopDruidIndexerConfigTest.java
	indexing-service/src/main/java/io/druid/indexing/common/task/HadoopIndexTask.java
	server/src/main/java/io/druid/segment/realtime/plumber/RealtimePlumber.java
	server/src/main/java/io/druid/segment/realtime/plumber/RealtimePlumberSchool.java
2014-04-25 14:03:28 -07:00
fjy 2d1f33e59f Merge pull request #500 from metamx/batch-ingestion-fixes
Batch ingestion fixes
2014-04-22 17:59:24 -06:00
nishantmonu51 357bbf5127 add all the shard specs 2014-04-23 05:23:11 +05:30
nishantmonu51 625a5418d2 minor fix 2014-04-23 05:05:51 +05:30
nishantmonu51 1ca61237c1 review comments- use final variables 2014-04-23 03:33:28 +05:30
nishantmonu51 0d8c1ffe54 review comments and add partitioner 2014-04-23 03:30:30 +05:30
nishantmonu51 ea4a80e8d2 Add serde test for shardCount 2014-04-23 00:24:08 +05:30
nishantmonu51 e920cec5d0 remove unused import 2014-04-23 00:13:30 +05:30
nishantmonu51 0748eabe9b batch ingestion fixes
1) Fix path when mapped output is compressed
2) Add number of reducers to the determine hashed partitions job
manually
3) Add a way to disable determine partitions and specify shardCount in
HashedPartitionsSpec
2014-04-23 00:05:08 +05:30
Crystark 40a6804192 Support for postgresql
I think it was the last request using 'end' missing the postgresql support.
2014-04-07 17:37:03 +02:00
fjy 2adcf07f5f Merge branch 'master' into new-schema
Conflicts:
	indexing-hadoop/src/main/java/io/druid/indexer/DetermineHashedPartitionsJob.java
	indexing-service/src/main/java/io/druid/indexing/common/task/RealtimeIndexTask.java
	indexing-service/src/test/java/io/druid/indexing/common/task/TaskSerdeTest.java
	processing/src/test/java/io/druid/segment/TestIndex.java
	server/src/main/java/io/druid/segment/realtime/RealtimeManager.java
	server/src/main/java/io/druid/segment/realtime/plumber/RealtimePlumberSchool.java
2014-03-17 10:59:31 -07:00
nishantmonu51 4ec1959c30 Use druid implementation of HyperLogLog
remove dependency on clear spring analytics
2014-03-07 00:06:40 +05:30
fjy 5db00afb37 clean up and default values 2014-03-04 14:38:27 -08:00
fjy c4c4d80336 make local testing pass 2014-03-03 14:52:43 -08:00
fjy 46b9ac78e7 Merge branch 'master' into new-schema
Conflicts:
	indexing-hadoop/src/test/java/io/druid/indexer/HadoopDruidIndexerConfigTest.java
	pom.xml
	publications/whitepaper/druid.pdf
	publications/whitepaper/druid.tex
2014-03-03 14:48:15 -08:00
fjy 13c7f1c7b1 remove dead code 2014-02-27 15:52:19 -08:00
fjy bf2ddda897 unit tests passing after more refactoring 2014-02-27 15:21:09 -08:00
nishantmonu51 5e0d418b4b fix determine partitions partitioner to work in local mode 2014-02-26 16:31:42 +05:30
nishantmonu51 1ed5254d5b improvements
1) Number of reducers use 1 only when intervals are to be determined
2) Read only useful bytes from BytesWritable
2014-02-26 02:51:45 +05:30
nishantmonu51 8af63005a6 refactor randomPartitionsSpec to hashedPartitionsSpec
refactor to a more appropriate name
2014-02-25 03:07:31 +05:30
fjy 5d2367f0fd unit tests pass at this point 2014-02-20 15:52:12 -08:00
fjy 20cac8c506 not compiling yet but close 2014-02-19 15:54:27 -08:00
fjy 4b7c76762d unit tests passingn at this point, finished rt port maybe 2014-02-18 15:14:38 -08:00
nishantmonu51 fde7269c86 check published segments before the intermediate files are cleaned up 2014-02-15 04:30:28 +05:30
fjy 3979eb270c Revert "Revert "Merge branch 'determine-partitions-improvements'""
This reverts commit 189b3e2b9b.
2014-02-14 12:58:56 -08:00
fjy a8c4362d72 rejiggering druid api 2014-02-14 12:57:52 -08:00
fjy 189b3e2b9b Revert "Merge branch 'determine-partitions-improvements'"
This reverts commit 7ad228ceb5, reversing
changes made to 9c55e2b779.
2014-02-14 12:47:34 -08:00
nishantmonu51 48d0c37f98 documentation for random partition spec 2014-02-05 15:30:44 +05:30
nishantmonu51 bacc72415f correct locking and partitionsSpec 2014-02-05 03:17:47 +05:30
nishantmonu51 569452121e fix partitioner for loca mode 2014-01-31 21:59:17 +05:30
nishantmonu51 82b748ad43 review comments 2014-01-31 20:19:33 +05:30
nishantmonu51 97e5d68635 determine intervals working with determine partitions 2014-01-31 19:04:52 +05:30
nishantmonu51 5fd76067cd remove logging and use new determine partition job 2014-01-31 13:51:38 +05:30
nishantmonu51 7ca87d59df Determine partitions using cardinality 2014-01-31 00:49:11 +05:30
fjy f898c29e20 fix batch indexing and prepare for next release 2014-01-17 15:52:04 -08:00
fjy 3b17c4c03c a whole bunch of docs and fixes 2014-01-13 18:01:56 -08:00
fjy 1ecc94cfb6 another attempt at index task 2014-01-10 17:56:22 -08:00
Hagen Rother 52746b8ea6 fix hadoop intake's parser exception catching (was too specific) 2013-12-19 07:04:47 +01:00
fjy a1c09df17f make the hadoop index task work again 2013-10-16 09:45:17 -07:00
cheddar c47fe202c7 Fix HadoopDruidIndexer to work with the new way of things
There are multiple and sundry changes in here.

First, "HadoopDruidIndexer" has been split into two pieces, (1) CliHadoop which pulls the hadoop version and builds up the right classpath with the proper hadoop version to run the indexer and (2) CliInternalHadoopIndexer which actually runs the indexer.

In order to work around a bunch of jets3t version conflicts with Hadoop and Druid, I needed to extract the S3 deep storage stuff into its own module.  I then also moved the HDFS stuff into its own module so that I could eliminate the dependency on Hadoop for druid-server.

In doing these changes, I wanted to make the extensions buildable with only the druid-api jar, so a few other things had to move out of Druid and into druid-api.  They are all API-level things, however, so they really belong in druid-api instead.

Lastly, I removed the druid-realtime module and put it all in druid-server.
2013-10-09 15:15:44 -05:00
fjy a79ad7bab4 make dynamic master resource configuration work again 2013-09-27 15:00:40 -07:00
fjy 8bc56daa66 fix things up according to code review comments 2013-09-26 11:35:45 -07:00
fjy 87259321b6 port hadoop druid indexer to new guice framework 2013-09-26 11:04:42 -07:00
cheddar 3c39f90c89 1) Move Firehose interface and dependencies to druid-api
2) Move DataSegment* interfaces and dependencies to druid-api
2013-08-31 16:43:28 -05:00
cheddar 5ab671050e No more com.metamx.druid, it is now all io.druid! 2013-08-30 19:42:12 -05:00
cheddar bd0756e360 More stuff moved, things still compiling and tests still passing. Yay! 2013-08-30 18:58:35 -05:00
cheddar 56e2b956d0 OMG!!! A lot of stuff has been moved. Modules have been created and destroyed, but everything is compiling and unit tests are passing, OMFG this is awesome.! 2013-08-30 18:21:04 -05:00
cheddar 2a46086e20 1) Didn't remove the io.druid files from client. Remove those and make sure things compile
2) Switch DefaultObjectMapper to CommonObjectMapper
3) Create new DefaultObjectMapper in client that has Query stuff registered on it by default
2013-08-29 15:25:36 -05:00
cheddar 9c30ced5ea 1) Move various "api" classes to io.druid packages and make sure things compile and stuff 2013-08-28 15:51:02 -05:00
cheddar 5fa944dd26 Merge branch 'master' into guice
Conflicts:
	client/src/main/java/com/metamx/druid/coordination/BatchDataSegmentAnnouncer.java
	client/src/main/java/com/metamx/druid/curator/announcement/Announcer.java
	client/src/main/java/com/metamx/druid/query/filter/SelectorDimFilter.java
	client/src/main/java/com/metamx/druid/query/search/SearchQueryQueryToolChest.java
	indexing-service/src/main/java/com/metamx/druid/indexing/common/tasklogs/S3TaskLogs.java
	indexing-service/src/main/java/com/metamx/druid/indexing/coordinator/ForkingTaskRunner.java
	indexing-service/src/main/java/com/metamx/druid/indexing/coordinator/RemoteTaskRunner.java
	indexing-service/src/main/java/com/metamx/druid/indexing/worker/WorkerCuratorCoordinator.java
	indexing-service/src/test/java/com/metamx/druid/indexing/coordinator/RemoteTaskRunnerTest.java
	pom.xml
	server/src/main/java/com/metamx/druid/http/MasterMain.java
	server/src/main/java/com/metamx/druid/http/MasterServletModule.java
	server/src/main/java/com/metamx/druid/master/DruidMasterConfig.java
	server/src/test/java/com/metamx/druid/master/DruidMasterTest.java
	server/src/test/java/com/metamx/druid/query/group/GroupByQueryRunnerTest.java
2013-08-27 14:27:32 -05:00
fjy d11d0a8284 fix according to code review 2013-08-22 10:49:46 -07:00
fjy 778fd0f10e Fix persist of empty indexes in index generator job 2013-08-22 10:16:43 -07:00
cheddar eee1efdcb5 Merge branch 'master' into guice
Conflicts:
	client/src/main/java/com/metamx/druid/client/DruidServerConfig.java
	indexing-service/src/main/java/com/metamx/druid/indexing/common/index/ChatHandlerProvider.java
	indexing-service/src/main/java/com/metamx/druid/indexing/coordinator/TaskMasterLifecycle.java
	indexing-service/src/main/java/com/metamx/druid/indexing/worker/executor/ExecutorNode.java
	indexing-service/src/test/java/com/metamx/druid/indexing/coordinator/TaskLifecycleTest.java
2013-08-06 13:33:31 -07:00
cheddar 3c808b15c3 1) Fix HadoopDruidIndexerConfigTest to actually verify the current correct behavior. 2013-08-05 11:37:20 -07:00
cheddar 2b71505421 1) Fix HadoopDruidIndexerConfig to no longer replace ":" with "_" on the segmentOutputDir. The segmentOutputDir is user-supplied so they should have the ability to just not set a bad directory. 2013-08-05 11:22:26 -07:00
cheddar 2361e0112a Make it all compile again... 2013-08-02 10:14:46 -07:00
cheddar 9e78bb38f5 Merge branch 'master' into guice
Conflicts:
	client/src/main/java/com/metamx/druid/QueryableNode.java
	client/src/main/java/com/metamx/druid/client/ServerInventoryView.java
	client/src/main/java/com/metamx/druid/coordination/SingleDataSegmentAnnouncer.java
	client/src/main/java/com/metamx/druid/initialization/CuratorDiscoveryConfig.java
	client/src/main/java/com/metamx/druid/query/MetricsEmittingExecutorService.java
	indexing-hadoop/src/test/java/com/metamx/druid/indexer/HadoopDruidIndexerConfigTest.java
	indexing-service/src/main/java/com/metamx/druid/indexing/common/TaskToolbox.java
	indexing-service/src/main/java/com/metamx/druid/indexing/coordinator/http/IndexerCoordinatorNode.java
	indexing-service/src/main/java/com/metamx/druid/indexing/worker/executor/ExecutorNode.java
	indexing-service/src/main/java/com/metamx/druid/indexing/worker/http/WorkerNode.java
	pom.xml
	server/src/main/java/com/metamx/druid/coordination/ServerManager.java
	server/src/main/java/com/metamx/druid/coordination/ZkCoordinator.java
	server/src/main/java/com/metamx/druid/db/DatabaseRuleManager.java
	server/src/main/java/com/metamx/druid/db/DatabaseSegmentManager.java
	server/src/main/java/com/metamx/druid/http/ComputeNode.java
	server/src/main/java/com/metamx/druid/http/MasterMain.java
	server/src/main/java/com/metamx/druid/loading/SegmentLoaderConfig.java
	server/src/main/java/com/metamx/druid/loading/SingleSegmentLoader.java
	server/src/main/java/com/metamx/druid/master/DruidMaster.java
2013-08-01 16:42:47 -07:00
Jan Rudert ad087a7a22 correct segment path for hadoop indexer 2013-07-10 09:21:45 +02:00
cheddar 2f56c24259 1) Inject IndexingServiceClient
2) Switch all the DBI references to IDBI
2013-06-07 17:37:33 -07:00
cheddar f68df7ab69 1) Make tests work and continue trying to make the DruidMaster start up with just Guice 2013-06-07 12:01:46 -07:00
fjy 42cc87a294 Merge branch 'master' into refactor-indexing
Conflicts:
	indexing-service/src/main/java/com/metamx/druid/indexing/common/task/IndexTask.java
	pom.xml
2013-05-31 17:28:59 -07:00
fjy 08d84001ba Merge branch 'master' into refactor-indexing 2013-05-16 16:03:29 -07:00
fjy 26e0eb62cb merge and other refactorings 2013-05-15 17:28:08 -07:00