Commit Graph

9417 Commits

Author SHA1 Message Date
Gian Merlino 82d77a8b72 Index service: Fix kill task 2013-01-25 13:15:49 -08:00
Gian Merlino 553738e1d8 Merge branch 'master' into task-stuff 2013-01-25 11:34:18 -08:00
Gian Merlino f3b04d3f5f S3SegmentKiller: Add TODO note 2013-01-25 11:33:50 -08:00
Gian Merlino e6a618ca76 Merge branch 'killsegments' into task-stuff
Conflicts:
	merger/src/main/java/com/metamx/druid/merger/common/TaskStatus.java
	merger/src/main/java/com/metamx/druid/merger/common/task/DeleteTask.java
	merger/src/main/java/com/metamx/druid/merger/common/task/IndexGeneratorTask.java
	merger/src/main/java/com/metamx/druid/merger/common/task/IndexTask.java
	merger/src/main/java/com/metamx/druid/merger/common/task/MergeTask.java
	merger/src/main/java/com/metamx/druid/merger/coordinator/LocalTaskRunner.java
	merger/src/main/java/com/metamx/druid/merger/coordinator/TaskQueue.java
	merger/src/main/java/com/metamx/druid/merger/coordinator/exec/TaskConsumer.java
	merger/src/main/java/com/metamx/druid/merger/worker/TaskMonitor.java
	merger/src/test/java/com/metamx/druid/merger/coordinator/RemoteTaskRunnerTest.java
	merger/src/test/java/com/metamx/druid/merger/coordinator/TaskQueueTest.java
2013-01-25 11:30:10 -08:00
Gian Merlino a14200d779 Index service support for early returns and choice of commit semantics.
Task:
- Add TaskCallback to run method (for early returns)

TaskStatus:
- Remove CONTINUED status
- Add segmentsNuked (placeholder for future deletion support)
- Add more builder methods
- Add validations to constructor

TaskStorage:
- Add TaskStorageQueryAdapter, a concrete class that wraps TaskStorages and
  provides various read-only convenience methods
- Add getTask method for benefit of TaskStorageQueryAdapter

TaskQueue:
- Rename "done" to "notify"
- notify is responsible for deciding if we should commit
- Add optional commitRunnable to "notify", which gets called when it's time to commit
- Allow nextTasks and commits to run early (statusCode RUNNING)
- Move getStatus, collapseStatus functionality to TaskStorageQueryAdapter
2013-01-25 11:05:34 -08:00
Gian Merlino 37417cf22f QueryServlet: Add "host" (remote address) to alerts 2013-01-25 11:03:44 -08:00
xvrl 5e1cac6d9f timezone test 2013-01-24 17:57:10 -08:00
Deep Ganguli cb845e6f09 Addresed Gian's code review 2013-01-24 17:54:06 -08:00
Fangjin Yang 7f410f201d updating amazon sdk version 2013-01-24 17:47:10 -08:00
Eric Tschetter ee7337fbb9 1) Adjust the Timeseries caching fixes to still store the long, but do the timezone adjustment on the way out.
2) Store a reference to the granularity object instead of getting it every time
2013-01-24 18:25:21 -06:00
cheddar ec034ddef4 Merge pull request #56 from metamx/determine-partitions
Determine partitions better
2013-01-24 12:57:39 -08:00
cheddar ad6f962000 Merge pull request #58 from metamx/cache-bulkget
modify cacheBroker interface to implement bulk-get
2013-01-24 09:57:43 -08:00
Deep Ganguli 017d4779d6 Implemented Hadoop Index Task which takes as input a HadoopDruidIndexConfig and
generates index segments.

The HadoopIndexTask run method wraps a HadoopDruidIndexerJob run method. The
key modifications to the HadoopDruidIndexerJob are as follows:

- The UpDaterJobSpec field of the config that is used to set up the indexer job
  is set to null. This ensures that the job does not push a list of published
segments to the database, in order to allow the indexing service to handle this
later.
- Set the version field of the config file based on the TaskContext. Also
  changed config.setVersion method to take a string (as opposed to a Date) as
input, and propogated this change where necessary.
- Set the SegmentOutputDir field of the config file based on the TaskToolbox,
  to allow the indexing service to handle where to write the segments too.
- Added a method to IndexGeneratorJob called getPublishedSegments, that simply
  returns a list of published segments without publishing this list to the
database.
2013-01-23 19:27:14 -08:00
Deep Ganguli fc07bc315e Added umbrellaInterval method, which takes an Iterable of intervals and returns
a single interval spanning the entire range of input intervals.
2013-01-23 18:59:51 -08:00
Eric Tschetter 9b6244ec15 Merge branch 'master' of github.com:metamx/druid 2013-01-23 18:37:03 -06:00
Eric Tschetter 67feee3cd6 1) Indexes don't always have an index.drd file anymore 2013-01-23 18:36:52 -06:00
xvrl 55ae4c87dd timezone support in groupby query 2013-01-23 08:51:02 -08:00
xvrl 35058786d9 match query interval to granularity for this test 2013-01-23 08:50:43 -08:00
xvrl 86a6d112e3 proper groupby tests 2013-01-22 16:54:14 -08:00
Fangjin Yang 272d737517 cleaning up some interactions with RTR and workers 2013-01-22 16:21:38 -08:00
xvrl d7ea8e9afc compare result timestamp based on millis + utcoffset 2013-01-21 17:01:41 -08:00
xvrl 8f38b775ae fix expected object type 2013-01-21 16:31:32 -08:00
xvrl f05c050c53 add test for timezone 2013-01-21 15:49:39 -08:00
Nelson Ray 94b72e8878 replace param BalancerCostAnalyzer getter with a factory 2013-01-21 15:32:29 -08:00
Fangjin Yang bab9ee8827 Merge branch 'master' into killsegments
Conflicts:
	merger/src/main/java/com/metamx/druid/merger/coordinator/http/IndexerCoordinatorNode.java
2013-01-21 14:47:49 -08:00
Gian Merlino 77a3f3cbe0 Merge branch 'master' into determine-partitions
Conflicts:
	indexer/src/main/java/com/metamx/druid/indexer/IndexGeneratorJob.java
2013-01-21 14:46:13 -08:00
Gian Merlino d9e6f1d954 DeterminePartitions follow-up
HadoopDruidIndexerConfig:
- Add partitionsSpec (backwards compatible with targetPartitionSize and partitionDimension)
- Add assumeGrouped flag to partitionsSpec

DeterminePartitionsJob:
- Skip group-by job if assumeGrouped is set
- Clean up code a bit
2013-01-21 14:38:35 -08:00
xvrl 068ca67dba fix cache not preserving timezone information 2013-01-21 14:38:04 -08:00
Nelson Ray 2d7113b263 huge simplification of balancing code 2013-01-21 14:28:25 -08:00
xvrl 40c0bcad29 simplify MapCache 2013-01-18 18:25:51 -08:00
cheddar 07131c51ed Merge pull request #53 from metamx/autoscaling
Use a database to store configurations for indexer workers
2013-01-18 17:08:52 -08:00
Fangjin Yang ac31afbce5 remove redundant index for primary key in config table 2013-01-18 16:45:23 -08:00
cheddar 8a8b8a59f9 Merge pull request #57 from metamx/yops-descriptor-dimensions
YeOldePlumberSchool: Populate dimension metadata in segment descriptor
2013-01-18 16:42:02 -08:00
xvrl e0c34c3b97 rename cacheBroker -> cache 2013-01-18 15:22:56 -08:00
xvrl 86ca8967ca rework code pulling from cache to be more readable 2013-01-18 15:17:10 -08:00
xvrl a70ae15585 replace Pair<String, ByteBuffer> with NamedKey 2013-01-18 15:17:05 -08:00
xvrl 9032ef521b fix interrupted thread 2013-01-18 15:17:05 -08:00
Gian Merlino 7166534666 YeOldePlumberSchool: Tweak for IndexIO changes 2013-01-17 16:03:18 -08:00
Gian Merlino 5ce53eb2ac Merge branch 'master' into yops-descriptor-dimensions 2013-01-17 16:02:32 -08:00
Fangjin Yang 38b2041ad9 key/value config table 2013-01-17 14:56:48 -08:00
Eric Tschetter 689ce4f8e1 1) Upgrade java-util dependency to include "ruby" time 2013-01-17 13:10:11 -06:00
xvrl 0bacb85a4a fix duplicate keys, shutdown gracefully and make sure we check all multiget keys in memcached benchmark 2013-01-16 19:18:14 -08:00
xvrl dcaa77a883 implement bulk get test 2013-01-16 19:15:43 -08:00
Eric Tschetter 5b1e03530c 1) Fix some bugs found by external test suite 2013-01-16 21:06:57 -06:00
Fangjin Yang 21613bc73b initial commit to hard delete segments 2013-01-16 17:31:01 -08:00
xvrl e2788187fb don't let timeout skew benchmark stats 2013-01-16 16:02:51 -08:00
Eric Tschetter c8cb96b006 1) Remove vast majority of usages of IndexIO.mapDir() and deprecated it. IndexIO.loadIndex() is the new IndexIO.mapDir()
2) Fix bug with IndexMerger and null columns
3) Add QueryableIndexIndexableAdapter so that QueryableIndexes can be merged
4) Adjust twitter example to have multiple values for each hash tag
5) Adjusted GroupByQueryEngine to just drop dimensions that don't exist instead of throwing an NPE
2013-01-16 17:10:33 -06:00
xvrl a2090411a3 modify cacheBroker interface to implement bulk-get 2013-01-16 14:49:50 -08:00
Gian Merlino 6fc350bfba YeOldePlumberSchool: Populate dimension metadata in segment descriptor 2013-01-16 11:30:24 -08:00
Gian Merlino 7b42ee6a6e Rework DeterminePartitionsJob in the hadoop indexer
- Can handle non-rolled-up input (by grouping input rows using an additional MR stage)
- Can select its own partitioning dimension, if none is supplied
- Can detect and avoid oversized shards due to bad dimension value distribution
- Shares input parsing code with IndexGeneratorJob
2013-01-16 08:15:01 -08:00