Commit Graph

270 Commits

Author SHA1 Message Date
jon-wei e5c4927b14 Add support for parsing BytesWritable strings to Hadoop Indexer 2015-08-28 14:27:14 -07:00
Gian Merlino 414a6fb477 Fix overlapping segments in IngestSegmentFirehose, DatasourceInputFormat.
Fixes #1678. IngestSegmentFirehose (and its users) need to remember which
windows of which segments should actually be read, based on a timeline.
2015-08-28 07:32:41 -07:00
Himanshu Gupta 2e0dd1d792 adding UTs and addressing review comments to
firehoseV2 addition to Realtime[Manager|Plumber],
essential segment metadata persist support,
kafka-simple-consumer-firehose extension patch
2015-08-27 20:50:46 -05:00
lvjq 2237a8cf0f kafka 8 simple consumer firehose 2015-08-27 20:50:46 -05:00
Charles Allen e38cf54bc8 Migrate TestDerbyConnector to a JUnit @Rule 2015-08-26 21:47:40 -07:00
Himanshu Gupta b3c570e78d update BatchDeltaIngestion.testDeltaIngestion(..) to check for proper glob path handling 2015-08-20 21:36:34 -05:00
Himanshu Gupta 85e3ce9096 split hadoop glob path before adding it to MultipleInputs
This can be safely reverted once https://issues.apache.org/jira/browse/MAPREDUCE-5061 is fixed
2015-08-20 21:36:34 -05:00
Himanshu Gupta a603bd9547 HadoopGlobPathSplitter implementation to split hadoop glob paths
This can be safely reverted once https://issues.apache.org/jira/browse/MAPREDUCE-5061 is fixed
2015-08-20 21:36:34 -05:00
Himanshu Gupta cf3ec8eb46 helpful cause explaining why SegmentDescriptorInfo did not exist 2015-08-19 10:29:04 -05:00
Himanshu Gupta a3bab5b7d9 IndexGeneratorJobTest type unit test for batch delta ingestion and reindexing 2015-08-16 14:07:35 -05:00
Himanshu Gupta 15fa43dd43 changing DatasourcePathSpec, to get segment list, so that hadoop indexer uses overlord action to get list of segments and passes when running as an overlord task. and, uses metadata store directly when running as standalone hadoop indexer
also, serialized list of segments is passed to DatasourcePathSpec so that hadoop classloader issues do not creep up
2015-08-16 14:07:35 -05:00
Himanshu Gupta 45947a1021 add ability to specify Multiple PathSpecs in batch ingestion, so that we can grab data from multiple places in same ingestion
Conflicts:
	indexing-hadoop/src/main/java/io/druid/indexer/HadoopDruidIndexerConfig.java
	indexing-hadoop/src/main/java/io/druid/indexer/JobHelper.java

Conflicts:
	indexing-hadoop/src/main/java/io/druid/indexer/path/PathSpec.java
2015-08-16 13:15:38 -05:00
Himanshu Gupta 1ae56f139b Druid Hadoop InputFormat and pathSpec
Conflicts:
	indexing-hadoop/src/main/java/io/druid/indexer/path/PathSpec.java
	indexing-service/pom.xml
2015-08-16 13:15:38 -05:00
Himanshu Gupta f1d309a671 do not run parser if value from InputFormat is already an InputRow 2015-08-14 14:44:22 -05:00
Himanshu Gupta 0eec1bbee2 json serde tests for HadoopTuningConfig 2015-07-20 12:01:53 -05:00
Himanshu Gupta f836c3a7ac adding flag useCombiner to hadoop tuning config that can be used to add a
hadoop combiner to hadoop batch ingestion to do merges on the mappers if possible
2015-07-20 12:01:53 -05:00
Himanshu Gupta 4ef484048a take control of InputRow serde between Mapper/Reducer in Hadoop Indexing
This allows for arbitrary InputFormat while hadoop batch ingestion that
can return records of value type other than Text
2015-07-20 12:01:53 -05:00
Himanshu Gupta f7a92db332 generic byte[] serde for InputRow 2015-07-20 12:01:53 -05:00
Charles Allen b2bc46be17 Merge pull request #1484 from tubemogul/feature/1463
JobHelper.ensurePaths will set job properties from config (tuningConf…
2015-07-07 10:58:16 -07:00
Michael Schiff 6ad451a44a JobHelper.ensurePaths will set job properties from config (tuningConfig.jobProperties) before adding input paths to the config.
Adding input paths will create Path and FileSystem instances which may depend on the values in the job config.
This allows all properties to be set from the spec file, avoiding having to directly edit cluster xml files.

IndexGeneratorJob.run adds job properties before adding input paths (adding input paths may depend on having job properies set)

JobHelperTest confirms that JobHelper.ensurePaths adds job properties

javadoc for addInputPaths to explain relationship with
addJobProperties
2015-07-01 12:45:32 -07:00
Davide Anastasia 4a3a7dd1ad read hadoop-indexer configuration file from HDFS 2015-06-24 14:08:53 -07:00
Hao Xia 1931491c9f A couple of hdfs related fixes
* Class loading issue with hdfs-storage extension
* Exception when using hdfs with non-fully qualified segment path
2015-06-19 17:22:20 -07:00
Charles Allen 94a567732a Wipe FileContext off the face of the earth
* Fixes https://github.com/druid-io/druid/issues/1433
* Works arround https://issues.apache.org/jira/browse/HADOOP-10643
* Reverts to the prior method of renaming
2015-06-16 09:48:09 -07:00
Charles Allen 6230ac90ae Use IndexMerger for conversion 2015-06-10 11:34:58 -07:00
Charles Allen 056cab93ed Add Hadoop Converter Job and task
* Fixes https://github.com/druid-io/druid/issues/1363
* Add extra utils in JobHelper based on PR feedback
2015-06-09 14:47:38 -07:00
Charles Allen 2a76bdc60a Abstractify hadoopy indexer configuration.
* Moves many items to JobHelper
* Remove dependencies of these functions on HadoopDruidIndexerConfig in favor of more general items
* Changes functionalities of some of the path methods to always return a path with scheme
* Adds retry to uploads
* Change output loadSpec determining from using outputFS.getClass().getName() to using outputFS.getScheme()
2015-06-08 10:53:27 -07:00
fjy be2a35188e Additional schema validations and better logs for common extensions 2015-05-27 16:25:02 -07:00
Xavier Léauté 4466e77b25 Merge pull request #1371 from guobingkun/unit_test
Unit test for IndexGeneratorJob
2015-05-22 10:34:24 -04:00
flow 07659f30ab bug fix: hdfs task log and indexing task not work properly with Hadoop HA 2015-05-21 20:49:42 +08:00
Bingkun Guo b46aff12ae Unit test for IndexGeneratorJob 2015-05-18 12:31:16 -05:00
Fangjin Yang a2dc58cd2d Merge pull request #1345 from pjain1/unit_test_warn_fix
fix warn msg and some unit tests
2015-05-08 08:06:20 -07:00
Parag Jain 01448d264c Fix warn msg and added some unit tests 2015-05-07 17:10:05 -05:00
fjy b19435d172 fix typos with batch ingestion in docs 2015-05-07 14:46:17 -07:00
Bingkun Guo 1ee550dd91 Fix a potential issue in DeterminePartitionsJob by making HadoopDruidIndexerConfig non-static, and two unit tests for DeterminPartitionsJob and LocalDataSegmentKiller 2015-05-04 20:00:29 -07:00
Xavier Léauté 3a3046ccf3 add support for dimension compression
- compression for single-value dimensions using CompressedVSizeIntsIndexedSupplier
- makes dimension compression configurable via IndexSpec
- IndexSpec also enables configuring bitmap and metric compression
2015-04-14 10:44:18 -07:00
Prajwal Tuladhar 3044bf5592 use Job.getInstance() to fix deprecated warnings 2015-04-09 13:22:21 -04:00
Xavier Léauté 8b5fa8f85d always upload SNAPSHOT self-contained jars 2015-04-03 21:18:09 -07:00
Dia Kharrat 3a6dc99384 log invalid rows in mapper of Hadoop indexer 2015-03-19 22:31:04 -07:00
Dia Kharrat 58d5f5e7f0 Honor ignoreInvalidRows in Hadoop indexer
The reducer of the hadoop indexer now ignores lines with parsing
exceptions (if enabled by the indexer config).
2015-03-19 22:31:04 -07:00
Himanshu Gupta 8c1f0834ba Removing MapWritableInputRowParser from indexing-hadoop it should really be an extension if user needs 2015-03-19 18:37:08 -05:00
Himanshu Gupta 3f7a7ba5d3 For batch hadoop indexing, make hadoop input format configuration. Given input format must extend from org.apache.hadoop.mapreduce.InputFormat 2015-03-18 16:09:45 -05:00
fjy bfe10bd156 This fixes arbitrary gran spec breaking 2015-03-17 12:19:43 -07:00
Himanshu Gupta 6a0405de20 fail early if there is no input data for batch hadoop indexing 2015-03-07 12:45:57 -06:00
Himanshu Gupta 30f64ff19e UTs update for indexing-hadoop 2015-02-25 15:45:57 -08:00
Xavier Léauté 0784d7e30e Merge pull request #1152 from himanshug/metastorage-pwd-provider
support for metadata store PasswordProvider interface
2015-02-25 15:19:37 -08:00
Fangjin Yang 708f35151d Merge pull request #1121 from gianm/issue-1116
Use the proper FileSystems for writing segments and caching jars. (for issue #1116)
2015-02-25 13:03:59 -08:00
Fangjin Yang 6424815f88 Merge pull request #1097 from metamx/better-hadoop-sort-key
Sort HadoopIndexer rows by time+dim bucket to help reduce spilling
2015-02-25 12:49:58 -08:00
Himanshu Gupta 126262edce support for PasswordProvider interface to enable writing druid extension which can get metadata store password from secured location or anywhere instead of plain text properties file 2015-02-25 14:05:19 -06:00
Himanshu Gupta 01a4f19ea2 removing dependency on NativeS3FileSystem and other file systems 2015-02-23 14:27:50 -06:00
Gian Merlino fd5a7d1f08 Use the proper FileSystems for writing segments and caching jars. (for issue #1116) 2015-02-12 16:20:10 -08:00
Xavier Léauté b1ec7afc12 Sort HadoopIndexer rows by time+dim bucket to help reduce spilling 2015-02-10 14:26:28 -08:00
Fangjin Yang 92e616de11 Merge pull request #1077 from metamx/remove-unused-imports
remove unused imports
2015-02-02 10:45:27 -08:00
nishantmonu51 ba932bb1f2 remove unused imports 2015-02-02 21:53:39 +05:30
fjy d05032b98a towards a community led druid 2015-01-31 20:57:36 -08:00
Xavier Léauté cd9635ff5e Merge pull request #1034 from druid-io/minor-rename
minor rename of things in hadoop ingestion config to match 0.6.x
2015-01-15 15:46:13 -08:00
fjy ccddbf8747 minor rename of things in hadoop ingestion config to match 0.6.x 2015-01-15 14:04:55 -08:00
Fangjin Yang 5bfcc43377 Merge pull request #1008 from metamx/stringConversionJavaUtilUpdate
Update all String conversions to and from byte[] to use the java-util StringUtils functions
2015-01-15 13:50:27 -08:00
Fangjin Yang 852e863425 Merge pull request #981 from druid-io/strictModuleTyping
Use Module instead of generic Object in Guice related items
2015-01-05 12:43:20 -08:00
Charles Allen b1b5c9099e Update all String conversions to and from byte[] to use the java-util StringUtils functions
* Speedup of GroupBy with javaScript filters by ~10%
* Requires https://github.com/metamx/java-util/pull/15
2015-01-05 11:22:32 -08:00
Xavier Léauté f1375b0bfb workaround to pass down bitmap type to map-reduce tasks 2015-01-02 17:29:00 -08:00
Charles Allen 7c8d4a7433 Use Module instead of generic Object in Guice related items 2014-12-19 10:54:06 -08:00
fjy 43d27ddaf0 update http client and fix logging 2014-12-15 16:59:57 -08:00
fjy e872952390 fix working path default bug 2014-12-15 14:51:58 -08:00
fjy 28b72a69ad redocumenting ingestion 2014-12-08 16:15:46 -08:00
nishantmonu51 40f223215a fix buffer pool usage 2014-12-05 16:09:26 +05:30
nishantmonu51 6e03a6245f Merge branch 'master' into onheap-incremental-index 2014-12-05 10:40:28 +05:30
Xavier Léauté 7cd45a6e1f IncrementalIndex throws exception if limit exceeded
- For now uses a hardcoded ratio of aggregator to timeanddim buffer sizes
- canAppendRow is a workaround for realtime index since the
Firehose currently does not have a way of rolling back the last event in
case of error
- canAppendRow needs a fudge factor; there is a race between checking
if we can add a row and actually adding a row, because of the way MapDB
reports its size.
2014-12-04 14:38:16 -08:00
Gian Merlino 20a7239ffd Replace google-http-client imports with real guava imports. 2014-12-04 10:57:57 -08:00
Charles Allen c2add5730b Fix Hadoop CLI jobs
* Change "schema" --> "spec" for cli hadoop to keep up with internal hadoop
* Added check for HadoopDruidIndexerConfig deserialization from Map to see if it is trying to get a HadoopDruidIndexerConfig or a HadoopIngestionSpec
2014-12-04 10:57:56 -08:00
xvrl c867d59ee0 fix error message 2014-12-03 15:30:32 -08:00
Xavier Léauté 2e6c254937 metadata injection not needed for indexing service 2014-12-03 15:09:31 -08:00
Gian Merlino d388a8fe89 Replace google-http-client imports with real guava imports. 2014-12-03 10:52:57 -08:00
nishantmonu51 4dc0fdba8a consider mapped size in limit calculation & review comments 2014-12-03 23:47:30 +05:30
nishantmonu51 da8bd7836b Introduce buffer size 2014-12-03 16:28:22 +05:30
Charles Allen 7cd689be75 Fix Hadoop CLI jobs
* Change "schema" --> "spec" for cli hadoop to keep up with internal hadoop
* Added check for HadoopDruidIndexerConfig deserialization from Map to see if it is trying to get a HadoopDruidIndexerConfig or a HadoopIngestionSpec
2014-12-02 11:23:04 -08:00
nishantmonu51 eac776f1a7 tests passing with on heap incremental index 2014-12-02 22:29:28 +05:30
Xavier Léauté 59542c41f8 fix port not set in DruidNode 2014-12-01 14:37:28 -08:00
Charles Allen 8b3652a67a Modify HadoopDruidIndexerConfig to give a port of 0 instead of -1 when binding DruidNode @Self annotation 2014-12-01 14:08:41 -08:00
fjy fdeab0c6af make Druid case sensitive 2014-11-19 14:27:31 -08:00
nishantmonu51 f0452c5968 merge from master 2014-11-18 19:34:51 +05:30
nishantmonu51 edf0fc0851 Make hashed partitions spec default
- make hashed partitionsSpec as default partitions spec for 0.7
2014-11-17 19:48:12 +05:30
nishantmonu51 0c2d06475d merge from master 2014-11-17 19:19:18 +05:30
Xavier Léauté 0498df25df override metadata storage injection in CliHadoopIndexer 2014-11-07 13:44:56 -08:00
Xavier Léauté 50a191425c fix injection on MetadataStorageUpdaterJob 2014-11-07 11:11:14 -08:00
Xavier Léauté 20a9aef96a fix test 2014-11-06 17:27:05 -08:00
Xavier Léauté 9c06db021f rename db->metadata postgres->postgresql 2014-10-31 10:30:27 -07:00
jisookim0513 aa754b86e8 build success! 2014-10-24 11:28:42 -07:00
fjy bef74104d9 merge with 0.7.x and resolve any conflicts 2014-10-23 17:24:06 -07:00
fjy d76d57d95d update docs 2014-10-22 16:16:28 -07:00
jisookim0513 37979282fe enabled ansi-quote in mysql; insert statement should now work 2014-10-21 00:09:19 -07:00
jisookim0513 7d5c5f2083 fixed createTable; fixed miscellaneous stuff; added DerbyMetadataRuleManagerProvider 2014-10-17 00:10:36 -07:00
nishantmonu51 41e88baeca Add test for bucket selection 2014-10-15 23:09:28 +05:30
nishantmonu51 f4a97aebbc fix rollup for hashed partitions
truncate timestamp while calculating the partitionNumber
2014-10-15 22:32:56 +05:30
nishantmonu51 b5d66381f3 more cleanup 2014-10-14 18:32:40 +05:30
nishantmonu51 454acd3f5a remove backwards compatible code
1) remove backwards compatible and deprecated code
2) make hashed partitions spec default
2014-10-13 19:30:44 +05:30
fjy c7b4d5b7b4 Merge branch 'master' into druid-0.7.x
Conflicts:
	processing/src/test/java/io/druid/segment/filter/SpatialFilterTest.java
2014-10-02 18:12:10 -07:00
nishantmonu51 ad75a21040 separate offheapIncrementalIndex implementation 2014-10-01 13:58:51 +05:30
jisookim0513 9d7b5d9b0f fixed javadoc; fixed pom files; deleted unnecessary class 2014-09-30 13:47:35 -07:00
nishantmonu51 358ff915bb fix merge conflicts 2014-09-30 22:19:18 +05:30
nishantmonu51 2789536bed merge changes from druid-0.7.x 2014-09-30 22:05:49 +05:30