Commit Graph

2760 Commits

Author SHA1 Message Date
Deep Ganguli cb845e6f09 Addresed Gian's code review 2013-01-24 17:54:06 -08:00
Fangjin Yang 7f410f201d updating amazon sdk version 2013-01-24 17:47:10 -08:00
Eric Tschetter ee7337fbb9 1) Adjust the Timeseries caching fixes to still store the long, but do the timezone adjustment on the way out.
2) Store a reference to the granularity object instead of getting it every time
2013-01-24 18:25:21 -06:00
cheddar ec034ddef4 Merge pull request #56 from metamx/determine-partitions
Determine partitions better
2013-01-24 12:57:39 -08:00
cheddar ad6f962000 Merge pull request #58 from metamx/cache-bulkget
modify cacheBroker interface to implement bulk-get
2013-01-24 09:57:43 -08:00
Deep Ganguli 017d4779d6 Implemented Hadoop Index Task which takes as input a HadoopDruidIndexConfig and
generates index segments.

The HadoopIndexTask run method wraps a HadoopDruidIndexerJob run method. The
key modifications to the HadoopDruidIndexerJob are as follows:

- The UpDaterJobSpec field of the config that is used to set up the indexer job
  is set to null. This ensures that the job does not push a list of published
segments to the database, in order to allow the indexing service to handle this
later.
- Set the version field of the config file based on the TaskContext. Also
  changed config.setVersion method to take a string (as opposed to a Date) as
input, and propogated this change where necessary.
- Set the SegmentOutputDir field of the config file based on the TaskToolbox,
  to allow the indexing service to handle where to write the segments too.
- Added a method to IndexGeneratorJob called getPublishedSegments, that simply
  returns a list of published segments without publishing this list to the
database.
2013-01-23 19:27:14 -08:00
Deep Ganguli fc07bc315e Added umbrellaInterval method, which takes an Iterable of intervals and returns
a single interval spanning the entire range of input intervals.
2013-01-23 18:59:51 -08:00
Eric Tschetter 9b6244ec15 Merge branch 'master' of github.com:metamx/druid 2013-01-23 18:37:03 -06:00
Eric Tschetter 67feee3cd6 1) Indexes don't always have an index.drd file anymore 2013-01-23 18:36:52 -06:00
xvrl 55ae4c87dd timezone support in groupby query 2013-01-23 08:51:02 -08:00
xvrl 35058786d9 match query interval to granularity for this test 2013-01-23 08:50:43 -08:00
xvrl 86a6d112e3 proper groupby tests 2013-01-22 16:54:14 -08:00
Fangjin Yang 272d737517 cleaning up some interactions with RTR and workers 2013-01-22 16:21:38 -08:00
xvrl d7ea8e9afc compare result timestamp based on millis + utcoffset 2013-01-21 17:01:41 -08:00
xvrl 8f38b775ae fix expected object type 2013-01-21 16:31:32 -08:00
xvrl f05c050c53 add test for timezone 2013-01-21 15:49:39 -08:00
Nelson Ray 94b72e8878 replace param BalancerCostAnalyzer getter with a factory 2013-01-21 15:32:29 -08:00
Fangjin Yang bab9ee8827 Merge branch 'master' into killsegments
Conflicts:
	merger/src/main/java/com/metamx/druid/merger/coordinator/http/IndexerCoordinatorNode.java
2013-01-21 14:47:49 -08:00
Gian Merlino 77a3f3cbe0 Merge branch 'master' into determine-partitions
Conflicts:
	indexer/src/main/java/com/metamx/druid/indexer/IndexGeneratorJob.java
2013-01-21 14:46:13 -08:00
Gian Merlino d9e6f1d954 DeterminePartitions follow-up
HadoopDruidIndexerConfig:
- Add partitionsSpec (backwards compatible with targetPartitionSize and partitionDimension)
- Add assumeGrouped flag to partitionsSpec

DeterminePartitionsJob:
- Skip group-by job if assumeGrouped is set
- Clean up code a bit
2013-01-21 14:38:35 -08:00
xvrl 068ca67dba fix cache not preserving timezone information 2013-01-21 14:38:04 -08:00
Nelson Ray 2d7113b263 huge simplification of balancing code 2013-01-21 14:28:25 -08:00
xvrl 40c0bcad29 simplify MapCache 2013-01-18 18:25:51 -08:00
cheddar 07131c51ed Merge pull request #53 from metamx/autoscaling
Use a database to store configurations for indexer workers
2013-01-18 17:08:52 -08:00
Fangjin Yang ac31afbce5 remove redundant index for primary key in config table 2013-01-18 16:45:23 -08:00
cheddar 8a8b8a59f9 Merge pull request #57 from metamx/yops-descriptor-dimensions
YeOldePlumberSchool: Populate dimension metadata in segment descriptor
2013-01-18 16:42:02 -08:00
xvrl e0c34c3b97 rename cacheBroker -> cache 2013-01-18 15:22:56 -08:00
xvrl 86ca8967ca rework code pulling from cache to be more readable 2013-01-18 15:17:10 -08:00
xvrl a70ae15585 replace Pair<String, ByteBuffer> with NamedKey 2013-01-18 15:17:05 -08:00
xvrl 9032ef521b fix interrupted thread 2013-01-18 15:17:05 -08:00
Gian Merlino 7166534666 YeOldePlumberSchool: Tweak for IndexIO changes 2013-01-17 16:03:18 -08:00
Gian Merlino 5ce53eb2ac Merge branch 'master' into yops-descriptor-dimensions 2013-01-17 16:02:32 -08:00
Fangjin Yang 38b2041ad9 key/value config table 2013-01-17 14:56:48 -08:00
Eric Tschetter 689ce4f8e1 1) Upgrade java-util dependency to include "ruby" time 2013-01-17 13:10:11 -06:00
xvrl 0bacb85a4a fix duplicate keys, shutdown gracefully and make sure we check all multiget keys in memcached benchmark 2013-01-16 19:18:14 -08:00
xvrl dcaa77a883 implement bulk get test 2013-01-16 19:15:43 -08:00
Eric Tschetter 5b1e03530c 1) Fix some bugs found by external test suite 2013-01-16 21:06:57 -06:00
Fangjin Yang 21613bc73b initial commit to hard delete segments 2013-01-16 17:31:01 -08:00
xvrl e2788187fb don't let timeout skew benchmark stats 2013-01-16 16:02:51 -08:00
Eric Tschetter c8cb96b006 1) Remove vast majority of usages of IndexIO.mapDir() and deprecated it. IndexIO.loadIndex() is the new IndexIO.mapDir()
2) Fix bug with IndexMerger and null columns
3) Add QueryableIndexIndexableAdapter so that QueryableIndexes can be merged
4) Adjust twitter example to have multiple values for each hash tag
5) Adjusted GroupByQueryEngine to just drop dimensions that don't exist instead of throwing an NPE
2013-01-16 17:10:33 -06:00
xvrl a2090411a3 modify cacheBroker interface to implement bulk-get 2013-01-16 14:49:50 -08:00
Gian Merlino 6fc350bfba YeOldePlumberSchool: Populate dimension metadata in segment descriptor 2013-01-16 11:30:24 -08:00
Gian Merlino 7b42ee6a6e Rework DeterminePartitionsJob in the hadoop indexer
- Can handle non-rolled-up input (by grouping input rows using an additional MR stage)
- Can select its own partitioning dimension, if none is supplied
- Can detect and avoid oversized shards due to bad dimension value distribution
- Shares input parsing code with IndexGeneratorJob
2013-01-16 08:15:01 -08:00
Eric Tschetter 538d00e75e Merge branch 'master' of github.com:metamx/druid 2013-01-16 10:02:01 -06:00
Eric Tschetter 8b31d8db9f 1) Adjust IndexMerger to create convert the indexes it creates from the old format to the new. This is done quite sub-optimally, but it will work for now... 2013-01-16 10:01:46 -06:00
Gian Merlino 616415cb7e UniformGranularitySpec: Only return bucketInterval for timestamps that legitimately
overlap our input intervals
2013-01-15 22:30:17 -08:00
Gian Merlino 86277d1114 StringInputRowParser: Treat dimensionExclusions case-insensitively 2013-01-15 22:30:17 -08:00
Fangjin Yang 3c0880fe28 Merge branch 'master' into autoscaling 2013-01-15 13:36:39 -08:00
Fangjin Yang 7e074e8158 fix pom breakage 2013-01-15 12:04:12 -08:00
Fangjin Yang 47ac36d3e4 Merge branch 'master' of github.com:metamx/druid into autoscaling 2013-01-15 12:02:32 -08:00