druid

Commit Graph

Author	SHA1	Message	Date
Gian Merlino	e40fba4de2	HadoopIndexTask: Jackson fixes and general overriding of storage-specific stuff	2013-02-28 07:53:28 -08:00
Gian Merlino	f862d9205d	Merge branch 'master' into hadoop-index-task Conflicts: merger/src/main/java/com/metamx/druid/merger/common/task/Task.java	2013-02-27 06:53:06 -08:00
Gian Merlino	7d7ce2b7fe	Fix DeterminePartitionsJob ISE for dimensions not present in all rows	2013-02-25 11:22:25 -08:00
Eric Tschetter	f8c54a72c2	1) Changes to allow for local storage	2013-02-21 15:47:01 -06:00
Deep Ganguli	60da9ffddb	Merge branch 'master' into hadoop-index-task Conflicts: common/src/main/java/com/metamx/druid/utils/JodaUtils.java indexer/src/main/java/com/metamx/druid/indexer/DbUpdaterJob.java indexer/src/main/java/com/metamx/druid/indexer/HadoopDruidIndexerConfig.java indexer/src/main/java/com/metamx/druid/indexer/IndexGeneratorJob.java	2013-02-15 13:22:31 -08:00
Deep Ganguli	e042c29173	Fixed typos in comments, changed prefix on s3Paths to s3n from s3://, cleaned up umbrellaIntervals code in JodaUtils, modified the DbUpdater to push segment meta-data to the database in batches.	2013-02-15 11:39:49 -08:00
Gian Merlino	ce1d90788d	DeterminePartitionsJob: Fix docs	2013-02-13 10:16:03 -08:00
Gian Merlino	a665bfa2ef	DeterminePartitionsJob: Select partition dimension to minimize segment size variance when cardinality is low	2013-02-13 09:51:04 -08:00
Eric Tschetter	a0b159fed7	1) Move up to Jackson 2.1 2) Some bugs were fixed, but they ended up getting mixed in with the Jackson upgrade :(	2013-02-12 20:58:17 -06:00
Eric Tschetter	9ac5eeebb3	1) Fix NPE in DeterminePartitionsJob when it fails	2013-02-06 23:34:46 -08:00
Eric Tschetter	34b4383934	1) Adjust DataSegment to have a binaryVersion field that indicates the expected binary version of the segment files 2) Eliminate race condition from RemoteTaskRunnerTest.testAlreadyExecutedTask()	2013-02-01 18:59:33 -06:00
Deep Ganguli	cb845e6f09	Addresed Gian's code review	2013-01-24 17:54:06 -08:00
Deep Ganguli	017d4779d6	Implemented Hadoop Index Task which takes as input a HadoopDruidIndexConfig and generates index segments. The HadoopIndexTask run method wraps a HadoopDruidIndexerJob run method. The key modifications to the HadoopDruidIndexerJob are as follows: - The UpDaterJobSpec field of the config that is used to set up the indexer job is set to null. This ensures that the job does not push a list of published segments to the database, in order to allow the indexing service to handle this later. - Set the version field of the config file based on the TaskContext. Also changed config.setVersion method to take a string (as opposed to a Date) as input, and propogated this change where necessary. - Set the SegmentOutputDir field of the config file based on the TaskToolbox, to allow the indexing service to handle where to write the segments too. - Added a method to IndexGeneratorJob called getPublishedSegments, that simply returns a list of published segments without publishing this list to the database.	2013-01-23 19:27:14 -08:00
Gian Merlino	77a3f3cbe0	Merge branch 'master' into determine-partitions Conflicts: indexer/src/main/java/com/metamx/druid/indexer/IndexGeneratorJob.java	2013-01-21 14:46:13 -08:00
Gian Merlino	d9e6f1d954	DeterminePartitions follow-up HadoopDruidIndexerConfig: - Add partitionsSpec (backwards compatible with targetPartitionSize and partitionDimension) - Add assumeGrouped flag to partitionsSpec DeterminePartitionsJob: - Skip group-by job if assumeGrouped is set - Clean up code a bit	2013-01-21 14:38:35 -08:00
Eric Tschetter	c8cb96b006	1) Remove vast majority of usages of IndexIO.mapDir() and deprecated it. IndexIO.loadIndex() is the new IndexIO.mapDir() 2) Fix bug with IndexMerger and null columns 3) Add QueryableIndexIndexableAdapter so that QueryableIndexes can be merged 4) Adjust twitter example to have multiple values for each hash tag 5) Adjusted GroupByQueryEngine to just drop dimensions that don't exist instead of throwing an NPE	2013-01-16 17:10:33 -06:00
Gian Merlino	7b42ee6a6e	Rework DeterminePartitionsJob in the hadoop indexer - Can handle non-rolled-up input (by grouping input rows using an additional MR stage) - Can select its own partitioning dimension, if none is supplied - Can detect and avoid oversized shards due to bad dimension value distribution - Shares input parsing code with IndexGeneratorJob	2013-01-16 08:15:01 -08:00
Gian Merlino	616415cb7e	UniformGranularitySpec: Only return bucketInterval for timestamps that legitimately overlap our input intervals	2013-01-15 22:30:17 -08:00
Fangjin Yang	5822f4f5f7	refactor master to run rules before cleaning up; more master stats; general improvements	2012-12-03 14:43:04 -08:00
Fangjin Yang	2e5e1ce989	first commit of tiers for compute nodes; working UT at this point	2012-11-28 17:37:08 -08:00
Eric Tschetter	0f63cb4f00	1) Have IndexGeneratorJob write the descriptors for each of the segments it creates to a path in the temporary working directory (generally HDFS) 2) Have the DbUpdaterJob read descriptors from the temporary working directory instead of looking in the final segment output location (often the eventually consistent S3) 3) 1 and 2 Fixes #30	2012-11-20 15:30:50 -06:00
Eric Tschetter	701cc9562b	1) Adjust the StorageAdapters to lowercase names of metrics and dimensions before looking them up. 2) Add some docs to InputRow/Row to indicate that column names passed into the methods are always lowercase and that the rows need to act accordingly. (fixes #29, or at least clarifies the behavior...)	2012-11-19 17:01:17 -06:00
Fangjin Yang	0ef40171a8	nodes no longer inherit from interfaces but instead extend classes	2012-11-13 13:18:31 -08:00
Fangjin Yang	24564d73e1	register subtypes for reducer	2012-11-12 16:41:34 -08:00
Fangjin Yang	57468d39ef	reverting some of the last changes	2012-11-12 16:14:48 -08:00
Fangjin Yang	c20dccd0f4	modifying the way registering serdes works to hopefully be a bit easier to use	2012-11-12 13:58:43 -08:00
Fangjin Yang	6da047b5fa	fix backwards compatibility issues	2012-11-08 15:09:00 -08:00
Fangjin Yang	34cb352cf8	working indexer with registererers	2012-11-06 14:26:53 -08:00
Fangjin Yang	5698f640d7	fix last commit with version	2012-11-06 12:40:53 -08:00
Fangjin Yang	0b6dd99452	set default version if one is not set	2012-11-06 12:36:55 -08:00
Fangjin Yang	34a221a586	fix bug with jackson conversion	2012-11-06 11:56:48 -08:00
Fangjin Yang	68e5adde33	register registererers in the config	2012-11-06 11:49:17 -08:00
Fangjin Yang	eb2b5a61fa	fix setters for hadoop node	2012-11-05 18:40:54 -08:00
Fangjin Yang	2ae0a15b5a	add register abilities to mapper	2012-11-05 18:31:23 -08:00
Fangjin Yang	9fbee29eb4	change hadoop indexer to be node based	2012-11-05 18:19:04 -08:00
Fangjin Yang	7b2522ff3f	allow hadoop druid indexer to register registererers	2012-11-05 16:13:50 -08:00
Eric Tschetter	27999caca0	1) Create LICENSE 2) Attach copyright and notice of license to files	2012-10-24 05:09:47 -04:00
Eric Tschetter	9d41599967	Initial commit of OSS Druid Code	2012-10-24 03:39:51 -04:00

38 Commits