Commit Graph

896 Commits

Author SHA1 Message Date
Gian Merlino e4e5f0375b SegmentAllocateAction (fixes #1515)
This is a feature meant to allow realtime tasks to work without being told upfront
what shardSpec they should use (so we can potentially publish a variable number
of segments per interval).

The idea is that there is a "pendingSegments" table in the metadata store that
tracks allocated segments. Each one has a segment id (the same segment id we know
and love) and is also part of a sequence.

The sequences are an idea from @cheddar that offers a way of doing replication.
If there are N tasks reading exactly the same data with exactly the same logic
(think Kafka tasks reading a fixed range of offsets) then you can place them
in the same sequence, and they will generate the same sequence of segments.
2015-11-11 16:54:35 -08:00
Xavier Léauté fa6142e217 cleanup and remove unused imports 2015-11-11 12:25:21 -08:00
zhxiaog c197a4cf32 fix #1918, add unit tests for RemoteTaskActionClient 2015-11-12 03:15:17 +08:00
Charles Allen abae47850a Add backwards compatability for PR #1922 2015-11-11 10:27:00 -08:00
Charles Allen 1df4baf489 Move Jackson Guice adapters into io.druid
* Removes access to protected methods in com.fasterxml
* Eliminates druid-common's use of foreign package com.fasterxml
2015-11-09 10:50:45 -08:00
fjy 8f231fd3e3 cleanup druid codebase 2015-11-04 13:59:53 -08:00
Himanshu Gupta 8b67417ac8 make methods in Index[Merger,Maker,IO] non-static so that they can have
appropriate ObjectMapper injected instead of creating one statically
2015-11-02 23:24:26 -06:00
Charles Allen 44a2b204df Add timeout to shutdown request to middle manager for indexing service 2015-10-27 13:56:03 -07:00
Charles Allen d2e400f063 Merge pull request #1740 from metamx/validate-locks
fix #1715
2015-09-29 09:38:42 -07:00
Xavier Léauté 25bbc0b923 Merge pull request #1778 from gianm/redirect-fixes
Redirect fixes
2015-09-25 09:54:48 -07:00
Gian Merlino 348172203f OverlordRedirectInfo: Fix ability to detect that there is no leader. 2015-09-25 09:30:09 -07:00
Parag Jain b630720164 fail task if finishjob throws any exception
add realtime task failure test
2015-09-25 10:55:45 -05:00
Gian Merlino 63bf021077 RemoteTaskRunner: Fix for starting an overlord before any workers ever existed. 2015-09-24 21:15:36 -07:00
Nishant b638400acb fix #1715
fixes #1715
- TaskLockBox has a set of active tasks
- lock requests throws exception for if they are from a task not in
active task set.
- TaskQueue is responsible for updating the active task set on
tasklockbox

fix #1715

fixes #1715
- TaskLockBox has a set of active tasks
- lock requests throws exception for if they are from a task not in
active task set.
- TaskQueue is responsible for updating the active task set on
tasklockbox

review comment

remove duplicate line

use ISE instead

organise imports
2015-09-24 10:06:50 +05:30
Himanshu 61b0743943 Merge pull request #1748 from metamx/forkingJavaOptionsWithQuotes
Allow ForkingTaskRunner javaOpts to have quoted arguments which contain spaces
2015-09-21 21:03:00 -05:00
Charles Allen 465035e531 Allow ForkingTaskRunner javaOpts to have quoted arguments which contain spaces 2015-09-21 17:32:27 -07:00
Fangjin Yang e48f6dd660 Merge pull request #1736 from gianm/additional-ingest-segment-timeline-test
IngestSegmentFirehostFactoryTimelineTest for overshadowing of the middle of a segment.
2015-09-17 14:42:29 -07:00
Gian Merlino 64e33b2bcb IngestSegmentFirehostFactoryTimelineTest for overshadowing of the middle of a segment. 2015-09-16 10:17:43 -07:00
Himanshu Gupta 74f4572bd4 Lazily deserialize "parser" to InputRowParser in DataSchema
so that user hadoop related InputRowParsers are created only when needed
this allows overlord to accept a HadoopIndexTask with a hadoopy InputRowParser
and not fail because hadoopy InputRowParser might need hadoop libraries
2015-09-16 10:58:13 -05:00
Fangjin Yang 75a582974b Merge pull request #1639 from gianm/new-plumber
New plumber
2015-09-03 18:52:57 -07:00
Gian Merlino 062a47fba4 Modify Plumbers in these ways,
1) Persist using Committer instead of Runnable. (Although the metadata object
   is ignored in this patch)

2) Remove the getSink method.

3) Plumbers are now responsible for time-based and hydrant-full-based periodic
   committing. (FireChief, RealtimeIndexTask, and IndexTask used to do this)
2015-09-03 11:13:06 -07:00
Nishant 726326abc3 Add Task Context and ability to override task specific properties
override javaOpts

fix compilation

review comments

Add Test for typecast

review comments - remove unused method.
2015-09-03 23:36:32 +05:30
Gian Merlino 940e1aa3eb Replace funky imports with standard ones.
1) Lots of Guava imports were not coming from the actual Guava
2) junit.framework.Assert should be org.junit.Assert
2015-08-28 18:02:05 -07:00
Gian Merlino 414a6fb477 Fix overlapping segments in IngestSegmentFirehose, DatasourceInputFormat.
Fixes #1678. IngestSegmentFirehose (and its users) need to remember which
windows of which segments should actually be read, based on a timeline.
2015-08-28 07:32:41 -07:00
Himanshu Gupta 2e0dd1d792 adding UTs and addressing review comments to
firehoseV2 addition to Realtime[Manager|Plumber],
essential segment metadata persist support,
kafka-simple-consumer-firehose extension patch
2015-08-27 20:50:46 -05:00
lvjq 2237a8cf0f kafka 8 simple consumer firehose 2015-08-27 20:50:46 -05:00
Nishant b306739e9c fix convert segment task
1) fix serde
2) fix wrong parameter being passed when creating subtask

remove sysout
2015-08-27 11:34:41 +05:30
Charles Allen e38cf54bc8 Migrate TestDerbyConnector to a JUnit @Rule 2015-08-26 21:47:40 -07:00
Xavier Léauté 51f6a9a2c9 update jackson to 2.6.1 2015-08-25 16:07:01 -07:00
Paul Otto 2301b60365 Add ability to provide taskResource for IndexTask. 2015-08-24 17:38:31 -07:00
Himanshu Gupta 15fa43dd43 changing DatasourcePathSpec, to get segment list, so that hadoop indexer uses overlord action to get list of segments and passes when running as an overlord task. and, uses metadata store directly when running as standalone hadoop indexer
also, serialized list of segments is passed to DatasourcePathSpec so that hadoop classloader issues do not creep up
2015-08-16 14:07:35 -05:00
Himanshu Gupta 4d4aa8bfc6 refactor IngestSegmentFirehoseFactory so that IngestSegmentFirehose becomes reusable
Conflicts:
	indexing-service/src/main/java/io/druid/indexing/firehose/IngestSegmentFirehoseFactory.java
2015-08-14 14:44:22 -05:00
Charles Allen 1ddaa3fb33 Merge pull request #1592 from metamx/clean-test-files
clean temporary files
2015-08-03 11:47:20 -07:00
Nishant 2679efee7a clean temporary files 2015-08-03 23:32:58 +05:30
Fangjin Yang 6f65e6d3ef Merge pull request #1547 from pjain1/improve_overlord_test
add test to OverlordResourceTest
2015-07-28 07:35:48 -10:00
Parag Jain 2e1b617346 add more tests 2015-07-24 15:12:08 -05:00
Fangjin Yang 97242356b4 Merge pull request #1480 from guobingkun/kill_task_test
Unit tests for KillTask and MetadataTaskStorage
2015-07-20 16:31:45 -07:00
Fangjin Yang 3f7ba58227 Merge pull request #1504 from metamx/fix-1447
fix for #1447
2015-07-14 08:50:08 -07:00
Himanshu e2ddfb7a1a Merge pull request #1511 from pjain1/remove_test
remove flaky overlord test
2015-07-13 18:38:34 -05:00
Parag Jain 59dec89f6a remove flaky overlord test 2015-07-13 15:32:12 -05:00
Himanshu cac722968e Merge pull request #1503 from metamx/fix-leaking-zk-nodes
Fix leaking Status Path nodes in ZK
2015-07-10 17:40:18 -05:00
Fangjin Yang 9f19e96658 Merge pull request #1477 from pjain1/overlord_test
overlord and task master test
2015-07-10 14:27:14 -07:00
Parag Jain 55c4fe64f3 overlord and task master test 2015-07-10 16:17:45 -05:00
Nishant 5fe27fe4ad fix for #1447
fixes #1447
2015-07-09 19:05:48 +05:30
Nishant 8d7a566bae Fix leaking Status Path nodes in ZK
- remove  ZK status path nodes for workers after they are removed
2015-07-09 17:20:09 +05:30
Charles Allen c0b60c0d2f I'm not your mom, indexing-service/test... cleanup after yourself 2015-07-01 15:00:09 -07:00
Bingkun Guo 282a0f9760 Unit tests for KillTask and MetadataTaskStorage 2015-06-29 17:55:41 -05:00
Himanshu b5b9ca1446 Merge pull request #1470 from pjain1/rtindex_test
Realtime Index Task test
2015-06-29 16:51:35 -05:00
Parag Jain 284b80b09e Realtime Index Task test 2015-06-29 09:52:41 -05:00
nishant fb4052d577 JavaScript Worker Select Strategy
this PR adds a JavaScriptWorkerSelectStrategy which allows defining
arbitrary logic for selecting workers to run task using a JavaScript
function.

This gives users full control to implement complex worker selection
strategies based on task attributes.

more tests and a complex javascript config

fix for java8 modify for nashorn compatibility
2015-06-20 02:01:34 +05:30
nishant e9afec4a2b fix task status issues on zk outages
docs

review comments

fix test

review comments

Review comments

fix compilation

fix typo
2015-06-11 00:49:52 +05:30
Xavier Léauté 78d468700b Merge pull request #1388 from metamx/fix-1360
fix race described in 1360
2015-06-10 11:59:36 -07:00
Xavier Léauté f6b336ac3e Merge pull request #1432 from metamx/config-fix
fix passing of config from IndexTuningConfig to RealtimeTuningConfig
2015-06-10 11:42:58 -07:00
nishant 963682d696 Add check for valid rowFlushBoundary configuration and fix tests 2015-06-10 21:38:34 +05:30
nishant 191b302f6a fix passing of config from IndexTuningConfig to RealtimeTuningConfig
- pass rowFlushboundary correctly instead of using default.
- fixes indexTask failing with
io.druid.segment.incremental.IndexSizeExceededException when
rowFlushboundary is set higher than
RealtimeTuningConfig.defaultMaxRowsInMemory

rename test method
2015-06-10 21:07:25 +05:30
nishant af9ea08041 fix race described in 1360
review comments

review comments

review comments

no need to remove

fix test

review comments
2015-06-10 12:19:12 +05:30
Charles Allen 056cab93ed Add Hadoop Converter Job and task
* Fixes https://github.com/druid-io/druid/issues/1363
* Add extra utils in JobHelper based on PR feedback
2015-06-09 14:47:38 -07:00
Xavier Léauté d834a974ba flag to enable public IP in EC2-VPC autoscaling 2015-05-28 18:14:12 -07:00
Charles Allen 633fdb029e Add option to ConvertSegmentTask to skip validation
* Validation is enabled by default
2015-04-27 08:37:55 -07:00
Xavier Léauté f73f14ab91 Merge pull request #1297 from metamx/versionConverterTaskUpdates
Update VersionConverterTask for IndexSpec and allowing Forced updates
2015-04-20 16:44:35 -07:00
Charles Allen 7479ac9012 Update VersionConverterTask for IndexSepc and allowing Forced updates 2015-04-20 16:17:06 -07:00
fjy d260515a43 update druid-api version 2015-04-17 14:58:35 -07:00
Xavier Léauté ea5572d001 Merge pull request #1271 from metamx/strictErrorChecking
Add stricter checking for potential coding errors
2015-04-15 15:21:41 -07:00
Charles Allen abdeaa0746 Add stricter checking for potential coding errors
Can use via `mvn clean compile test-compile -P strict'
2015-04-15 14:52:25 -07:00
Xavier Léauté 3a3046ccf3 add support for dimension compression
- compression for single-value dimensions using CompressedVSizeIntsIndexedSupplier
- makes dimension compression configurable via IndexSpec
- IndexSpec also enables configuring bitmap and metric compression
2015-04-14 10:44:18 -07:00
fjy 195a3b8bb8 ignore rows with invalid interval 2015-04-06 16:08:40 -07:00
Charles Allen 1c6cbea89c Revert "Revert "Overhaul of SegmentPullers to add consistency and retries""
This reverts commit f904bc7858.
2015-03-30 13:40:04 -07:00
Fangjin Yang f904bc7858 Revert "Overhaul of SegmentPullers to add consistency and retries" 2015-03-30 13:15:50 -07:00
Charles Allen 6d407e8677 Add URI handling to SegmentPullers
* Requires https://github.com/druid-io/druid-api/pull/37
* Requires https://github.com/metamx/java-util/pull/22
* Moves the puller logic to use a more standard workflow going through java-util helpers instead of re-writing the handlers for each impl
  * General workflow goes like this: 1) LoadSpec makes sure the correct Puller is called with the correct parameters. 2) The Puller sets up general information like how to make an InputStream, how to find a file name (for .gz files for example), and when to retry. 3) CompressionUtils does most of the heavy lifting when it can
2015-03-30 12:33:23 -07:00
msprunck 942c17a2aa Remove timeline chunk count assumptions.
* Replace with generic iterables
2015-03-24 22:40:49 +01:00
fjy bfe10bd156 This fixes arbitrary gran spec breaking 2015-03-17 12:19:43 -07:00
Xavier Léauté ddfafa0711 randomize task ID to fix spurious test failure 2015-03-12 18:08:48 -07:00
Himanshu Gupta 23545fc01c correctly parse recentlyFinishedThreshold from config 2015-03-12 09:46:57 -05:00
Gian Merlino b810cdfe58 EC2AutoScaler: Allow setting "iamProfile". 2015-03-10 17:41:35 -07:00
Gian Merlino d102a89760 Fix license on EC2AutoScalerSerdeTest. 2015-03-10 17:31:30 -07:00
Gian Merlino 9235b45063 EC2AutoScaler: Support for setting subnetId. 2015-03-10 11:29:56 -07:00
Himanshu Gupta bd5cecdd44 UTs update for indexing service 2015-02-25 15:45:58 -08:00
Charles Allen 79a3e8f59f Fix overriding base of IndexerZkConfig to be absolute instead of relative
* Updated docs to clarify ZK config behavior
* Added unit tests for this case
2015-02-04 13:04:06 -08:00
Fangjin Yang 92e616de11 Merge pull request #1077 from metamx/remove-unused-imports
remove unused imports
2015-02-02 10:45:27 -08:00
nishantmonu51 ba932bb1f2 remove unused imports 2015-02-02 21:53:39 +05:30
fjy d05032b98a towards a community led druid 2015-01-31 20:57:36 -08:00
Xavier Léauté bd49528805 Merge pull request #1073 from druid-io/fix-statusPath
Fix worker status path announcement with indexer zk config
2015-01-30 12:51:21 -08:00
fjy bc1405bee0 fix worker status path announcement with indexer zk config 2015-01-30 12:26:08 -08:00
Xavier Léauté 2c2771b90e Make dynamic worker selection actually work 2015-01-27 14:17:42 -08:00
fjy 2d516fa591 Add a new equal distribution strategy for assigning tasks 2015-01-20 13:12:22 -08:00
Fangjin Yang 5bfcc43377 Merge pull request #1008 from metamx/stringConversionJavaUtilUpdate
Update all String conversions to and from byte[] to use the java-util StringUtils functions
2015-01-15 13:50:27 -08:00
Charles Allen 67757b6aea Change IndexerZkConfig to use @JacksonInject instead of just straight @Inject
* Updated IndexerZkConfig to use no setters, and take all arguments from constructor instead
* Also added more unit tests
2015-01-08 11:11:17 -08:00
Charles Allen f6fbb733b8 Added a few places where tests were using Object instead of Module 2015-01-05 13:47:25 -08:00
Charles Allen b1b5c9099e Update all String conversions to and from byte[] to use the java-util StringUtils functions
* Speedup of GroupBy with javaScript filters by ~10%
* Requires https://github.com/metamx/java-util/pull/15
2015-01-05 11:22:32 -08:00
Charles Allen 65286a24e0 Change zk configs to use Jackson injection instead of Skife
* Also added generic config testing class JsonConfigTesterBase
2014-12-29 10:36:12 -08:00
Fangjin Yang b3fe91bb50 Merge pull request #830 from metamx/union-merge-on-historical
Union merge on historical
2014-12-15 13:36:47 -07:00
Xavier Léauté 092dfe0309 fix IndexTaskTest tmp dir
- Create local firehose files in a clean temp directory to avoid
firehose reading other random temp files that start with 'druid'
2014-12-12 17:05:45 -08:00
nishantmonu51 1a1b0e6f23 merge from master and review comments 2014-12-09 13:16:45 +05:30
Gian Merlino 20a7239ffd Replace google-http-client imports with real guava imports. 2014-12-04 10:57:57 -08:00
nishantmonu51 da8bd7836b Introduce buffer size 2014-12-03 16:28:22 +05:30
Fangjin Yang 3ff569ef2d Merge pull request #879 from metamx/rtr-with-pref
Rewrite autoscaling and enable easier configuration of worker selection and autoscaling behaviour
2014-11-24 17:54:28 -07:00
fjy 9b701bbc76 a few more code review fixes 2014-11-24 10:54:29 -08:00
fjy 1aaea9a0d7 address code review 2014-11-24 10:52:30 -08:00
fjy 580e1172c1 move IndexTask to use hashed partition; fixes #815 2014-11-21 11:15:25 -08:00
fjy fdeab0c6af make Druid case sensitive 2014-11-19 14:27:31 -08:00
fjy 64719b15e0 rewrite autoscaling with tests 2014-11-18 15:41:06 -08:00
fjy c91310914b fix a few naming things 2014-11-17 16:05:18 -08:00
fjy 32600e10bb address code review 2014-11-17 15:55:22 -08:00
fjy 1af6b337f2 optionally choose what worker to send tasks to 2014-11-17 14:50:56 -08:00
nishantmonu51 0c2d06475d merge from master 2014-11-17 19:19:18 +05:30
nishantmonu51 cbffe3c648 merge from master and resolve conflicts 2014-11-17 18:07:08 +05:30
Fangjin Yang 2336e6c167 Merge pull request #758 from metamx/jisoo-metadata
make metadata storage pluggable
2014-11-07 11:30:11 -07:00
nishantmonu51 fd8eb7742b handle union query on realtime node 2014-11-07 23:27:50 +05:30
Xavier Léauté 9bc20ef8bf prefer druid.curator.compress to druid.indexer.runner.compressZnodes 2014-11-06 11:28:51 -08:00
Xavier Léauté 1872b8f979 make it easier to test 2014-10-31 14:49:07 -07:00
Xavier Léauté 9c06db021f rename db->metadata postgres->postgresql 2014-10-31 10:30:27 -07:00
Xavier Léauté 377151beda better abstraction for metadatastorage 2014-10-30 18:23:35 -07:00
jisookim0513 aa754b86e8 build success! 2014-10-24 11:28:42 -07:00
fjy bef74104d9 merge with 0.7.x and resolve any conflicts 2014-10-23 17:24:06 -07:00
jisookim0513 37979282fe enabled ansi-quote in mysql; insert statement should now work 2014-10-21 00:09:19 -07:00
jisookim0513 7d5c5f2083 fixed createTable; fixed miscellaneous stuff; added DerbyMetadataRuleManagerProvider 2014-10-17 00:10:36 -07:00
nishantmonu51 454acd3f5a remove backwards compatible code
1) remove backwards compatible and deprecated code
2) make hashed partitions spec default
2014-10-13 19:30:44 +05:30
Gian Merlino e1fedbe741 RemoteTaskRunner should respect worker version changes (fixes #787). 2014-10-12 11:27:44 -07:00
jisookim0513 521398267c fixed inconsistent variable names 2014-10-10 17:00:50 -07:00
fjy c7b4d5b7b4 Merge branch 'master' into druid-0.7.x
Conflicts:
	processing/src/test/java/io/druid/segment/filter/SpatialFilterTest.java
2014-10-02 18:12:10 -07:00
Gian Merlino 0781781b99 Merge pull request #766 from metamx/extend-rtr
make the worker selection strategy in remotetaskrunner extendable
2014-09-30 12:52:12 -07:00
Gian Merlino 1e6ce8ac9a TaskLogs fixes and cleanups.
- Fix negative offsets in FileTaskLogs and HdfsTaskLogs.
- Consolidate file offset code into LogUtils (currently used in two places).
- Clean up style for HdfsTaskLogs and related classes.
- Remove unused code in ForkingTaskRunner.
2014-09-29 16:20:34 -07:00
fjy 4a09678739 make the selection strategy in rtr extendable 2014-09-29 14:24:02 -07:00
jisookim0513 74565c9371 cleaned up the code 2014-09-27 13:10:01 -07:00
jisookim0513 6a641621b2 finished merging into druid-0.7.x; derby not working (to be fixed) 2014-09-26 14:24:53 -07:00
nishantmonu51 f51ab84386 merge changes from druid-0.7.x 2014-09-22 23:48:45 +05:30
jisookim0513 273205f217 initial attempt for abstraction; druid cluster works with Derby as a default 2014-09-19 17:39:59 -07:00
nishantmonu51 8eb6466487 revert buffer size and add back rowFlushBoundary 2014-09-19 23:06:04 +05:30
fjy fec7b43fcb make making v9 segments something completely configurable 2014-09-10 15:28:30 -07:00
fjy 351afb8be7 allow legacy index generator 2014-09-09 17:04:35 -07:00
Xavier Léauté 58ab759fc6 remove unused imports 2014-08-29 14:03:47 -07:00
Gian Merlino 68aeafaacd Allow indexing tasks to specify extra classpaths.
This could be used by Hadoop tasks to reference configs for different clusters, assuming
that the possible configs have been pre-distributed to middle managers.
2014-08-28 18:00:26 -07:00
fjy d64879ccca more cleanup 2014-08-20 13:22:42 -07:00
nishantmonu51 fe105d52ee use bufferSize for IndexTask 2014-08-20 22:41:34 +05:30
fjy 4fd5479559 fix typo 2014-08-19 12:34:10 -07:00
nishantmonu51 c6712739dc merge changes from druid-0.7.x 2014-08-12 15:47:42 +05:30
fjy 91ebe45b4e support both rejectionPolicy and rejectionPolicyFactory in serde 2014-08-07 10:06:27 -07:00
nishantmonu51 637bd35785 merge changes from druid-0.7.x 2014-07-31 16:07:22 +05:30
Gian Merlino 09fcfc3b6d Fix race in RemoteTaskRunner that could lead to zombie tasks. 2014-07-18 11:41:50 -07:00
nishantmonu51 4ce12470a1 Add way to skip determine partitions for index task
Add a way to skip determinePartitions for IndexTask by manually
specifying numShards.
2014-07-18 18:52:15 +05:30
fjy c6078ca841 address code review 2014-07-17 13:34:05 -07:00
fjy 5197ea527a disable middlemanagers based on worker version 2014-07-17 12:35:45 -07:00
nishantmonu51 f5f05e3a9b Sync changes from branch new-ingestion PR #599
Sync and Resolve Conflicts
2014-07-11 16:15:10 +05:30
nishantmonu51 518ab473f3 improve port finding strategy for task runner
1) Recycle free ports
2) Choose only ports that are free & not used by any other application
2014-07-03 09:58:12 +05:30
fjy a870fe5cbe inject column config 2014-06-19 14:47:57 -07:00
Xavier Léauté 09346b0a3c make column cache configurable 2014-06-19 14:43:03 -07:00
fjy 0bc1915067 Merge pull request #578 from metamx/new-guava
Update guava, java-util, and druid-api
2014-06-18 14:23:32 -06:00
Gian Merlino 9f16f0a955 More flexible EC2 user data classes. 2014-06-17 17:10:17 -07:00
fjy 5227b43050 fix more test breakage 2014-06-17 10:35:01 -07:00
fjy 8a13e34c50 fix broken ut 2014-06-17 10:31:50 -07:00
nishantmonu51 0629be622c remove unnecessary changes & fix index closing subquery 2014-06-16 18:50:49 +05:30
fjy a63cda3281 Merge branch 'master' into new-guava
Conflicts:
	server/src/main/java/io/druid/server/QueryResource.java
2014-06-13 10:08:10 -07:00
nishantmonu51 6265613bb9 Merge branch 'master' into offheap-incremental-index 2014-06-05 17:42:57 +05:30
nishantmonu51 01e8a713b6 unit tests passing with offheap-indexing 2014-06-05 17:42:53 +05:30
fjy 9f4cc5ca1f fix test 2014-06-04 16:29:20 -07:00
fjy 77ec4df797 update guava, java-util, and druid-api 2014-06-03 13:43:38 -07:00
fjy d75cc7b9b8 fix more serde 2014-05-06 15:17:38 -07:00
fjy 1100d2f2a1 rename configs to make a bit more sense 2014-05-06 14:52:50 -07:00
fjy b6fb4245aa Merge branch 'master' into new-schema
Conflicts:
	indexing-hadoop/src/main/java/io/druid/indexer/HadoopDriverConfig.java
	indexing-hadoop/src/main/java/io/druid/indexer/HadoopDruidIndexerConfig.java
	indexing-hadoop/src/main/java/io/druid/indexer/HadoopDruidIndexerConfigBuilder.java
	pom.xml
	server/src/main/java/io/druid/segment/realtime/RealtimeManager.java
	server/src/main/java/io/druid/segment/realtime/firehose/EventReceiverFirehoseFactory.java
2014-05-06 14:32:51 -07:00
Gian Merlino bdf9e74a3b Allow config-based overriding of hadoop job properties. 2014-05-06 09:11:31 -07:00
fjy 76e0a48527 Merge branch 'master' into new-schema
Conflicts:
	indexing-hadoop/src/main/java/io/druid/indexer/DbUpdaterJob.java
	indexing-hadoop/src/test/java/io/druid/indexer/HadoopDruidIndexerConfigTest.java
	indexing-service/src/main/java/io/druid/indexing/common/task/HadoopIndexTask.java
	server/src/main/java/io/druid/segment/realtime/plumber/RealtimePlumber.java
	server/src/main/java/io/druid/segment/realtime/plumber/RealtimePlumberSchool.java
2014-04-25 14:03:28 -07:00
fjy 4360c050eb fix broken ut 2014-03-25 11:13:52 -07:00
Gian Merlino 70db460f97 Blocking Executors and maxPendingPersists, oh my!
- Execs.newBlockingSingleThreaded can now accept capacity = 0.
- Changed default maxPendingPersists from 2 to 0.
- Fixed serde of maxPendingPersists in RealtimeIndexTasks.
2014-03-05 10:55:12 -08:00
fjy 46b9ac78e7 Merge branch 'master' into new-schema
Conflicts:
	indexing-hadoop/src/test/java/io/druid/indexer/HadoopDruidIndexerConfigTest.java
	pom.xml
	publications/whitepaper/druid.pdf
	publications/whitepaper/druid.tex
2014-03-03 14:48:15 -08:00
fjy bf2ddda897 unit tests passing after more refactoring 2014-02-27 15:21:09 -08:00
Xavier Léauté 2f61035585 add restore task 2014-02-25 13:41:40 -08:00
fjy 5d2367f0fd unit tests pass at this point 2014-02-20 15:52:12 -08:00
fjy 20cac8c506 not compiling yet but close 2014-02-19 15:54:27 -08:00
fjy 4b7c76762d unit tests passingn at this point, finished rt port maybe 2014-02-18 15:14:38 -08:00
fjy 3979eb270c Revert "Revert "Merge branch 'determine-partitions-improvements'""
This reverts commit 189b3e2b9b.
2014-02-14 12:58:56 -08:00
fjy 189b3e2b9b Revert "Merge branch 'determine-partitions-improvements'"
This reverts commit 7ad228ceb5, reversing
changes made to 9c55e2b779.
2014-02-14 12:47:34 -08:00
nishantmonu51 7ad228ceb5 Merge branch 'determine-partitions-improvements'
Conflicts:
	pom.xml
2014-02-12 10:51:26 +05:30
Gian Merlino 5ec634e498 SimpleResourceManagementStrategy: Scale up to minWorkerCount when increased 2014-02-06 13:20:09 -08:00
fjy 14d0e54327 first commit 2014-02-03 14:15:03 -08:00
nishantmonu51 97e5d68635 determine intervals working with determine partitions 2014-01-31 19:04:52 +05:30
fjy 0c789412bb add a workaround for jackson bug where jacksoninject fails when a null value is passed through json creator annotated constructor 2014-01-25 07:07:27 +08:00
fjy 2ff86984da fix broken ut 2014-01-21 10:47:45 -08:00
fjy 1d81ad2946 remove unused class 2014-01-20 16:45:54 -08:00
fjy 1ecc9d0f98 fix the edge case where autoscaling tries to terminate node without ip 2014-01-20 16:44:19 -08:00
Gian Merlino 17ad4ee2f0 Fix RemoteTaskRunnerTest 2013-12-20 11:23:28 -08:00
Gian Merlino 4a722c0a6d Autoscaling changes from code review.
- Log and return immediately when workerSetupData is null
- Allow provisioning more nodes while other nodes are still provisioning
- Add tests for bumping up the minimum version
2013-12-20 08:59:35 -08:00
Gian Merlino 1f4b99634f Autoscaling: Move target count independent of actual count.
This should let us grow and shrink the worker pool in chunks when necessary
(like when a bunch of them go offline, or when there is a worker version
change).
2013-12-19 16:11:30 -08:00
Gian Merlino 1ff855d744 Fix MoveTask serde and ArchiveTask id creation 2013-12-18 15:17:12 -08:00
Xavier Léauté ac2ca0e46c separate move and archive tasks 2013-12-16 14:00:55 -08:00
Xavier Léauté a417cd5df2 add archive task 2013-12-16 13:59:15 -08:00
fjy 01f9c1df31 fix broken task storage config and prepare for next release 2013-12-13 16:45:32 -08:00
Gian Merlino 600dc7546f Configurability of recency threshold 2013-12-13 16:02:54 -08:00
Gian Merlino f36a5b677c TaskLifecycleTest: Add test for noop task 2013-12-13 07:48:28 -08:00
Gian Merlino 3b053a66ff TaskLifecycleTest: Add test for never-ready task 2013-12-13 07:48:27 -08:00
Gian Merlino 6227963af9 TaskQueue: Copy task list before management loop. 2013-12-13 07:48:27 -08:00
Gian Merlino 370e2f855a TaskSerdeTest: Fix IndexTask test by including an actual firehoseFactory 2013-12-12 13:58:44 -08:00
Gian Merlino 169f149cf9 TaskLifecycleTest: Fix broken setUp and broken assumptions. 2013-12-12 13:51:13 -08:00
Gian Merlino be25d51a2c RemoteTaskRunner: Fix issues leading to failing tests 2013-12-12 13:49:49 -08:00
Gian Merlino 0129ea99cf RemoteTaskRunner changes to make bootstrapping actually work.
- Workers are not added to zkWorkers until caches have been initialized.
- Worker status we haven't heard about will be added to runningTasks or
  completeTasks as appropriate. 
- TaskRunnerWorkItem now only needs a taskId, not the entire Task. This makes
  it possible to create them from TaskStatus objects, if that's all we have.
- Also remove some dead code.
2013-12-12 10:44:46 -08:00
Gian Merlino c4b8c8bc6f Rework indexing service internals to hopefully be more reliable.
The TaskQueue directly manages the TaskRunner. The main management loop runs
periodically and checks that the runner is doing reasonable things. If not, it
attempts to adjust the runner. The management loop also runs on-demand when a
task is added to keep task assignment relatively low latency. The TaskConsumer
is no longer necessary and so it no longer exists.

Task interval locks are handled differently. Instead of some tasks acquiring
locks at runtime and some tasks having implicit fixed lock intervals, all tasks
ask for locks explicitly. This occurs either in "isReady" (which runs on the
overlord) or in "run" (which runs on the peon).

Other changes:
- The TaskQueue is attached to the leader lifecycle, instead of global
- The TaskLockbox is able to sync itself from storage and is no longer
  bootstrapped by the TaskQueue.
- RemoteTaskRunner does not clean up zk paths until asked to. This will
  prevent deletion of statuses that have not yet been committed.
- Added retries on DbTaskStorage operations.
- Removed SpawnTasksAction (no more subtasks)
- Removed obsolete EventReceiverFirehose configs
- Removed obsolete OldOverlordResource
- Removed TaskStorageQueryAdapter methods related to subtasks
2013-12-11 15:05:16 -08:00
fjy a049b42674 fix an issue with task tables not getting created automatically and prepare for next release 2013-11-07 18:01:35 -08:00
fjy a1c09df17f make the hadoop index task work again 2013-10-16 09:45:17 -07:00
fjy 4e509d1d09 Merge branch 'master' into is-docs 2013-10-09 14:05:10 -07:00
cheddar c47fe202c7 Fix HadoopDruidIndexer to work with the new way of things
There are multiple and sundry changes in here.

First, "HadoopDruidIndexer" has been split into two pieces, (1) CliHadoop which pulls the hadoop version and builds up the right classpath with the proper hadoop version to run the indexer and (2) CliInternalHadoopIndexer which actually runs the indexer.

In order to work around a bunch of jets3t version conflicts with Hadoop and Druid, I needed to extract the S3 deep storage stuff into its own module.  I then also moved the HDFS stuff into its own module so that I could eliminate the dependency on Hadoop for druid-server.

In doing these changes, I wanted to make the extensions buildable with only the druid-api jar, so a few other things had to move out of Druid and into druid-api.  They are all API-level things, however, so they really belong in druid-api instead.

Lastly, I removed the druid-realtime module and put it all in druid-server.
2013-10-09 15:15:44 -05:00
fjy 4ec4b8e024 rewrite indexing service docs 2013-10-08 16:34:58 -07:00
fjy 703b674800 add availability zone info to autoscaling 2013-10-07 12:16:50 -07:00
fjy ac330f72bb first set of changes to standarize the naming convention we use in druid 2013-10-03 16:36:48 -07:00
fjy 8bc56daa66 fix things up according to code review comments 2013-09-26 11:35:45 -07:00
fjy 87259321b6 port hadoop druid indexer to new guice framework 2013-09-26 11:04:42 -07:00
fjy dc8a119787 fix broken unit tests are a result of the last merge 2013-09-23 12:56:01 -07:00
fjy cabae7993d port over multi threaded realtime and also fix broken realtime nodes that can't start up 2013-09-16 16:03:47 -07:00
fjy f7c10e3594 rework tests in indexing service to be more unit testy 2013-09-12 16:37:58 -07:00
cheddar a2dcc45a8e 1) Remove SingleSegmentLoader and replace with OmniSegmentLoader 2013-09-12 11:47:03 -05:00
cheddar 6c9a107356 1) remove duplicate package initialization.initialization 2013-09-09 17:02:57 -05:00
cheddar 3c39f90c89 1) Move Firehose interface and dependencies to druid-api
2) Move DataSegment* interfaces and dependencies to druid-api
2013-08-31 16:43:28 -05:00
cheddar 5ab671050e No more com.metamx.druid, it is now all io.druid! 2013-08-30 19:42:12 -05:00
cheddar bd0756e360 More stuff moved, things still compiling and tests still passing. Yay! 2013-08-30 18:58:35 -05:00
cheddar 56e2b956d0 OMG!!! A lot of stuff has been moved. Modules have been created and destroyed, but everything is compiling and unit tests are passing, OMFG this is awesome.! 2013-08-30 18:21:04 -05:00
cheddar 9c30ced5ea 1) Move various "api" classes to io.druid packages and make sure things compile and stuff 2013-08-28 15:51:02 -05:00
cheddar ee1e73cfa1 1) Make it compile again after the merge 2013-08-27 14:36:01 -05:00
cheddar 5fa944dd26 Merge branch 'master' into guice
Conflicts:
	client/src/main/java/com/metamx/druid/coordination/BatchDataSegmentAnnouncer.java
	client/src/main/java/com/metamx/druid/curator/announcement/Announcer.java
	client/src/main/java/com/metamx/druid/query/filter/SelectorDimFilter.java
	client/src/main/java/com/metamx/druid/query/search/SearchQueryQueryToolChest.java
	indexing-service/src/main/java/com/metamx/druid/indexing/common/tasklogs/S3TaskLogs.java
	indexing-service/src/main/java/com/metamx/druid/indexing/coordinator/ForkingTaskRunner.java
	indexing-service/src/main/java/com/metamx/druid/indexing/coordinator/RemoteTaskRunner.java
	indexing-service/src/main/java/com/metamx/druid/indexing/worker/WorkerCuratorCoordinator.java
	indexing-service/src/test/java/com/metamx/druid/indexing/coordinator/RemoteTaskRunnerTest.java
	pom.xml
	server/src/main/java/com/metamx/druid/http/MasterMain.java
	server/src/main/java/com/metamx/druid/http/MasterServletModule.java
	server/src/main/java/com/metamx/druid/master/DruidMasterConfig.java
	server/src/test/java/com/metamx/druid/master/DruidMasterTest.java
	server/src/test/java/com/metamx/druid/query/group/GroupByQueryRunnerTest.java
2013-08-27 14:27:32 -05:00
cheddar 3617ac17fc 1) Eliminate ExecutorMain and have it run using the new Main! 2013-08-27 14:11:05 -05:00
cheddar 269997dc94 1) ExecutorNode is working, except for the running of the task. Need to adjust it to be able to run a task and then everything will be wonderful 2013-08-26 18:08:41 -05:00
cheddar 55dbda2046 1) Worker appears to be running! It's also now known as the MiddleManager 2013-08-23 17:59:48 -05:00
cheddar b897c2cb22 1) IndexCoordinator appears to work as the CliOverlord now, yay! 2013-08-23 14:11:34 -05:00
fjy d92ab8bb58 more logs for RTR 2013-08-21 21:47:59 -07:00
fjy 88661b26a0 bug fix for RTR removing workers race condition and partition chunks not being sorted by chunk number 2013-08-21 11:14:54 -07:00
Gian Merlino 455645e723 Workers announce TaskAnnouncement rather than TaskStatus 2013-08-20 16:14:36 -07:00
Gian Merlino 4e8325f963 Better tests and error messages for TaskResource 2013-08-20 14:01:38 -07:00
Gian Merlino d8493f8e26 RealtimeIndexTask: Fix "resource" serde 2013-08-20 13:02:52 -07:00
fjy 1fb6107a37 fix the case where RTR does not clean up a completed task on startup 2013-08-15 13:09:02 -07:00
cheddar 4a64ce37ed Finish the merging, wtf IntelliJ? 2013-08-06 13:34:24 -07:00
cheddar eee1efdcb5 Merge branch 'master' into guice
Conflicts:
	client/src/main/java/com/metamx/druid/client/DruidServerConfig.java
	indexing-service/src/main/java/com/metamx/druid/indexing/common/index/ChatHandlerProvider.java
	indexing-service/src/main/java/com/metamx/druid/indexing/coordinator/TaskMasterLifecycle.java
	indexing-service/src/main/java/com/metamx/druid/indexing/worker/executor/ExecutorNode.java
	indexing-service/src/test/java/com/metamx/druid/indexing/coordinator/TaskLifecycleTest.java
2013-08-06 13:33:31 -07:00
fjy 479f0cefca fix bug with RTR not assigning tasks when a new worker is available 2013-08-05 17:57:59 -07:00
fjy 35f89d7232 make RTR idempotent to multiple run requests for same task, because higher level things in the indexing service require this behaviour 2013-08-05 14:44:01 -07:00
cheddar 2361e0112a Make it all compile again... 2013-08-02 10:14:46 -07:00
fjy 584ccac833 move scanning of workers and tasks into RTR start, simplify bootstrap, make tests better 2013-08-01 17:50:05 -07:00
cheddar 9e78bb38f5 Merge branch 'master' into guice
Conflicts:
	client/src/main/java/com/metamx/druid/QueryableNode.java
	client/src/main/java/com/metamx/druid/client/ServerInventoryView.java
	client/src/main/java/com/metamx/druid/coordination/SingleDataSegmentAnnouncer.java
	client/src/main/java/com/metamx/druid/initialization/CuratorDiscoveryConfig.java
	client/src/main/java/com/metamx/druid/query/MetricsEmittingExecutorService.java
	indexing-hadoop/src/test/java/com/metamx/druid/indexer/HadoopDruidIndexerConfigTest.java
	indexing-service/src/main/java/com/metamx/druid/indexing/common/TaskToolbox.java
	indexing-service/src/main/java/com/metamx/druid/indexing/coordinator/http/IndexerCoordinatorNode.java
	indexing-service/src/main/java/com/metamx/druid/indexing/worker/executor/ExecutorNode.java
	indexing-service/src/main/java/com/metamx/druid/indexing/worker/http/WorkerNode.java
	pom.xml
	server/src/main/java/com/metamx/druid/coordination/ServerManager.java
	server/src/main/java/com/metamx/druid/coordination/ZkCoordinator.java
	server/src/main/java/com/metamx/druid/db/DatabaseRuleManager.java
	server/src/main/java/com/metamx/druid/db/DatabaseSegmentManager.java
	server/src/main/java/com/metamx/druid/http/ComputeNode.java
	server/src/main/java/com/metamx/druid/http/MasterMain.java
	server/src/main/java/com/metamx/druid/loading/SegmentLoaderConfig.java
	server/src/main/java/com/metamx/druid/loading/SingleSegmentLoader.java
	server/src/main/java/com/metamx/druid/master/DruidMaster.java
2013-08-01 16:42:47 -07:00
fjy 4ae8395538 1) on bootstrap, load all initial data and do a compare with bootstrapped tasks, delete any that are extra out there
2) change autoscaling logic such that it only works with remote task runnrs
3) zk workers use their own status caches to determine what they are running
2013-07-26 14:32:08 -07:00
fjy a1262760b2 Merge branch 'master' into worker-resource 2013-07-26 10:02:15 -07:00
Gian Merlino 30d98f56c1 RealtimeIndexTask: Allow configurable rejection policies 2013-07-24 16:12:54 -07:00
Gian Merlino 0eebb0a149 Add RealtimeMetricsMonitor to RealtimeIndexTask 2013-07-24 14:39:59 -07:00
fjy b9578a1ada 1) remove retry logic from RTR
2) simplify configs
3) introduce task resource
4) make worker versions match coordinator version by default
2013-07-12 12:51:12 -07:00
fjy 7219ed15d3 fix according to code review 2013-07-02 15:56:12 -07:00
fjy caa68e101a first commit; things working right now 2013-06-19 15:56:45 -07:00
cheddar f68df7ab69 1) Make tests work and continue trying to make the DruidMaster start up with just Guice 2013-06-07 12:01:46 -07:00
fjy e4ea357b52 multiple bug fixes for indexing service; scv quotes 2013-06-06 14:10:18 -07:00
fjy 06931ee0f5 introduce availability groups 2013-06-04 17:12:19 -07:00
fjy 42cc87a294 Merge branch 'master' into refactor-indexing
Conflicts:
	indexing-service/src/main/java/com/metamx/druid/indexing/common/task/IndexTask.java
	pom.xml
2013-05-31 17:28:59 -07:00
fjy 6213c0b63c Merge branch 'master' into refactor-indexing 2013-05-15 17:14:40 -07:00
fjy 20ae1d8b6b lots of cleanups and refactorings 2013-05-15 15:37:04 -07:00