Commit Graph

717 Commits

Author SHA1 Message Date
Himanshu Gupta 9ca6106128 user specified hadoop settings are ignored if explicitly set in code 2015-08-31 10:50:18 -05:00
Gian Merlino 940e1aa3eb Replace funky imports with standard ones.
1) Lots of Guava imports were not coming from the actual Guava
2) junit.framework.Assert should be org.junit.Assert
2015-08-28 18:02:05 -07:00
jon-wei e5c4927b14 Add support for parsing BytesWritable strings to Hadoop Indexer 2015-08-28 14:27:14 -07:00
Gian Merlino 414a6fb477 Fix overlapping segments in IngestSegmentFirehose, DatasourceInputFormat.
Fixes #1678. IngestSegmentFirehose (and its users) need to remember which
windows of which segments should actually be read, based on a timeline.
2015-08-28 07:32:41 -07:00
Himanshu Gupta 2e0dd1d792 adding UTs and addressing review comments to
firehoseV2 addition to Realtime[Manager|Plumber],
essential segment metadata persist support,
kafka-simple-consumer-firehose extension patch
2015-08-27 20:50:46 -05:00
lvjq 2237a8cf0f kafka 8 simple consumer firehose 2015-08-27 20:50:46 -05:00
Charles Allen e38cf54bc8 Migrate TestDerbyConnector to a JUnit @Rule 2015-08-26 21:47:40 -07:00
Himanshu Gupta b3c570e78d update BatchDeltaIngestion.testDeltaIngestion(..) to check for proper glob path handling 2015-08-20 21:36:34 -05:00
Himanshu Gupta 85e3ce9096 split hadoop glob path before adding it to MultipleInputs
This can be safely reverted once https://issues.apache.org/jira/browse/MAPREDUCE-5061 is fixed
2015-08-20 21:36:34 -05:00
Himanshu Gupta a603bd9547 HadoopGlobPathSplitter implementation to split hadoop glob paths
This can be safely reverted once https://issues.apache.org/jira/browse/MAPREDUCE-5061 is fixed
2015-08-20 21:36:34 -05:00
Himanshu Gupta cf3ec8eb46 helpful cause explaining why SegmentDescriptorInfo did not exist 2015-08-19 10:29:04 -05:00
Xavier Léauté 3b2e41e42a update for next release 2015-08-18 17:16:46 -07:00
Himanshu Gupta a3bab5b7d9 IndexGeneratorJobTest type unit test for batch delta ingestion and reindexing 2015-08-16 14:07:35 -05:00
Himanshu Gupta 15fa43dd43 changing DatasourcePathSpec, to get segment list, so that hadoop indexer uses overlord action to get list of segments and passes when running as an overlord task. and, uses metadata store directly when running as standalone hadoop indexer
also, serialized list of segments is passed to DatasourcePathSpec so that hadoop classloader issues do not creep up
2015-08-16 14:07:35 -05:00
Himanshu Gupta 45947a1021 add ability to specify Multiple PathSpecs in batch ingestion, so that we can grab data from multiple places in same ingestion
Conflicts:
	indexing-hadoop/src/main/java/io/druid/indexer/HadoopDruidIndexerConfig.java
	indexing-hadoop/src/main/java/io/druid/indexer/JobHelper.java

Conflicts:
	indexing-hadoop/src/main/java/io/druid/indexer/path/PathSpec.java
2015-08-16 13:15:38 -05:00
Himanshu Gupta 1ae56f139b Druid Hadoop InputFormat and pathSpec
Conflicts:
	indexing-hadoop/src/main/java/io/druid/indexer/path/PathSpec.java
	indexing-service/pom.xml
2015-08-16 13:15:38 -05:00
Himanshu Gupta f1d309a671 do not run parser if value from InputFormat is already an InputRow 2015-08-14 14:44:22 -05:00
Himanshu Gupta 0eec1bbee2 json serde tests for HadoopTuningConfig 2015-07-20 12:01:53 -05:00
Himanshu Gupta f836c3a7ac adding flag useCombiner to hadoop tuning config that can be used to add a
hadoop combiner to hadoop batch ingestion to do merges on the mappers if possible
2015-07-20 12:01:53 -05:00
Himanshu Gupta 4ef484048a take control of InputRow serde between Mapper/Reducer in Hadoop Indexing
This allows for arbitrary InputFormat while hadoop batch ingestion that
can return records of value type other than Text
2015-07-20 12:01:53 -05:00
Himanshu Gupta f7a92db332 generic byte[] serde for InputRow 2015-07-20 12:01:53 -05:00
Xavier Léauté 4cfb00bc8a inrement version 2015-07-15 13:09:05 -07:00
Charles Allen b2bc46be17 Merge pull request #1484 from tubemogul/feature/1463
JobHelper.ensurePaths will set job properties from config (tuningConf…
2015-07-07 10:58:16 -07:00
Michael Schiff 6ad451a44a JobHelper.ensurePaths will set job properties from config (tuningConfig.jobProperties) before adding input paths to the config.
Adding input paths will create Path and FileSystem instances which may depend on the values in the job config.
This allows all properties to be set from the spec file, avoiding having to directly edit cluster xml files.

IndexGeneratorJob.run adds job properties before adding input paths (adding input paths may depend on having job properies set)

JobHelperTest confirms that JobHelper.ensurePaths adds job properties

javadoc for addInputPaths to explain relationship with
addJobProperties
2015-07-01 12:45:32 -07:00
Davide Anastasia 4a3a7dd1ad read hadoop-indexer configuration file from HDFS 2015-06-24 14:08:53 -07:00
Hao Xia 1931491c9f A couple of hdfs related fixes
* Class loading issue with hdfs-storage extension
* Exception when using hdfs with non-fully qualified segment path
2015-06-19 17:22:20 -07:00
Xavier Léauté 0a5bb909a2 [maven-release-plugin] prepare for next development iteration 2015-06-18 17:35:19 -07:00
Xavier Léauté 59c6b2b279 [maven-release-plugin] prepare release druid-0.8.0-rc1 2015-06-18 17:35:14 -07:00
Charles Allen 94a567732a Wipe FileContext off the face of the earth
* Fixes https://github.com/druid-io/druid/issues/1433
* Works arround https://issues.apache.org/jira/browse/HADOOP-10643
* Reverts to the prior method of renaming
2015-06-16 09:48:09 -07:00
Charles Allen 6230ac90ae Use IndexMerger for conversion 2015-06-10 11:34:58 -07:00
Charles Allen 056cab93ed Add Hadoop Converter Job and task
* Fixes https://github.com/druid-io/druid/issues/1363
* Add extra utils in JobHelper based on PR feedback
2015-06-09 14:47:38 -07:00
Charles Allen 2a76bdc60a Abstractify hadoopy indexer configuration.
* Moves many items to JobHelper
* Remove dependencies of these functions on HadoopDruidIndexerConfig in favor of more general items
* Changes functionalities of some of the path methods to always return a path with scheme
* Adds retry to uploads
* Change output loadSpec determining from using outputFS.getClass().getName() to using outputFS.getScheme()
2015-06-08 10:53:27 -07:00
fjy be2a35188e Additional schema validations and better logs for common extensions 2015-05-27 16:25:02 -07:00
Xavier Léauté 4466e77b25 Merge pull request #1371 from guobingkun/unit_test
Unit test for IndexGeneratorJob
2015-05-22 10:34:24 -04:00
flow 07659f30ab bug fix: hdfs task log and indexing task not work properly with Hadoop HA 2015-05-21 20:49:42 +08:00
Bingkun Guo b46aff12ae Unit test for IndexGeneratorJob 2015-05-18 12:31:16 -05:00
fjy 7a6acf5c1b update pom to 0.8 2015-05-11 19:41:58 -06:00
Fangjin Yang a2dc58cd2d Merge pull request #1345 from pjain1/unit_test_warn_fix
fix warn msg and some unit tests
2015-05-08 08:06:20 -07:00
Parag Jain 01448d264c Fix warn msg and added some unit tests 2015-05-07 17:10:05 -05:00
fjy b19435d172 fix typos with batch ingestion in docs 2015-05-07 14:46:17 -07:00
Bingkun Guo 1ee550dd91 Fix a potential issue in DeterminePartitionsJob by making HadoopDruidIndexerConfig non-static, and two unit tests for DeterminPartitionsJob and LocalDataSegmentKiller 2015-05-04 20:00:29 -07:00
Xavier Léauté 3a3046ccf3 add support for dimension compression
- compression for single-value dimensions using CompressedVSizeIntsIndexedSupplier
- makes dimension compression configurable via IndexSpec
- IndexSpec also enables configuring bitmap and metric compression
2015-04-14 10:44:18 -07:00
Prajwal Tuladhar 3044bf5592 use Job.getInstance() to fix deprecated warnings 2015-04-09 13:22:21 -04:00
Xavier Léauté 8b5fa8f85d always upload SNAPSHOT self-contained jars 2015-04-03 21:18:09 -07:00
fjy aea7f9d192 [maven-release-plugin] prepare for next development iteration 2015-03-30 16:35:24 -07:00
fjy 060d7aef03 [maven-release-plugin] prepare release druid-0.7.1 2015-03-30 16:35:20 -07:00
Dia Kharrat 3a6dc99384 log invalid rows in mapper of Hadoop indexer 2015-03-19 22:31:04 -07:00
Dia Kharrat 58d5f5e7f0 Honor ignoreInvalidRows in Hadoop indexer
The reducer of the hadoop indexer now ignores lines with parsing
exceptions (if enabled by the indexer config).
2015-03-19 22:31:04 -07:00
Himanshu Gupta 8c1f0834ba Removing MapWritableInputRowParser from indexing-hadoop it should really be an extension if user needs 2015-03-19 18:37:08 -05:00
Xavier Léauté a98187f798 Merge pull request #1177 from himanshug/custom_input_format1
Feature:  Make hadoop input format configurable for batch ingestion
2015-03-19 15:49:36 -07:00