druid

Commit Graph

Author	SHA1	Message	Date
jon-wei	e5c4927b14	Add support for parsing BytesWritable strings to Hadoop Indexer	2015-08-28 14:27:14 -07:00
Gian Merlino	414a6fb477	Fix overlapping segments in IngestSegmentFirehose, DatasourceInputFormat. Fixes #1678. IngestSegmentFirehose (and its users) need to remember which windows of which segments should actually be read, based on a timeline.	2015-08-28 07:32:41 -07:00
Himanshu Gupta	2e0dd1d792	adding UTs and addressing review comments to firehoseV2 addition to Realtime[Manager\|Plumber], essential segment metadata persist support, kafka-simple-consumer-firehose extension patch	2015-08-27 20:50:46 -05:00
lvjq	2237a8cf0f	kafka 8 simple consumer firehose	2015-08-27 20:50:46 -05:00
Charles Allen	e38cf54bc8	Migrate TestDerbyConnector to a JUnit @Rule	2015-08-26 21:47:40 -07:00
Himanshu Gupta	b3c570e78d	update BatchDeltaIngestion.testDeltaIngestion(..) to check for proper glob path handling	2015-08-20 21:36:34 -05:00
Himanshu Gupta	85e3ce9096	split hadoop glob path before adding it to MultipleInputs This can be safely reverted once https://issues.apache.org/jira/browse/MAPREDUCE-5061 is fixed	2015-08-20 21:36:34 -05:00
Himanshu Gupta	a603bd9547	HadoopGlobPathSplitter implementation to split hadoop glob paths This can be safely reverted once https://issues.apache.org/jira/browse/MAPREDUCE-5061 is fixed	2015-08-20 21:36:34 -05:00
Himanshu Gupta	cf3ec8eb46	helpful cause explaining why SegmentDescriptorInfo did not exist	2015-08-19 10:29:04 -05:00
Himanshu Gupta	a3bab5b7d9	IndexGeneratorJobTest type unit test for batch delta ingestion and reindexing	2015-08-16 14:07:35 -05:00
Himanshu Gupta	15fa43dd43	changing DatasourcePathSpec, to get segment list, so that hadoop indexer uses overlord action to get list of segments and passes when running as an overlord task. and, uses metadata store directly when running as standalone hadoop indexer also, serialized list of segments is passed to DatasourcePathSpec so that hadoop classloader issues do not creep up	2015-08-16 14:07:35 -05:00
Himanshu Gupta	45947a1021	add ability to specify Multiple PathSpecs in batch ingestion, so that we can grab data from multiple places in same ingestion Conflicts: indexing-hadoop/src/main/java/io/druid/indexer/HadoopDruidIndexerConfig.java indexing-hadoop/src/main/java/io/druid/indexer/JobHelper.java Conflicts: indexing-hadoop/src/main/java/io/druid/indexer/path/PathSpec.java	2015-08-16 13:15:38 -05:00
Himanshu Gupta	1ae56f139b	Druid Hadoop InputFormat and pathSpec Conflicts: indexing-hadoop/src/main/java/io/druid/indexer/path/PathSpec.java indexing-service/pom.xml	2015-08-16 13:15:38 -05:00
Himanshu Gupta	f1d309a671	do not run parser if value from InputFormat is already an InputRow	2015-08-14 14:44:22 -05:00
Himanshu Gupta	0eec1bbee2	json serde tests for HadoopTuningConfig	2015-07-20 12:01:53 -05:00
Himanshu Gupta	f836c3a7ac	adding flag useCombiner to hadoop tuning config that can be used to add a hadoop combiner to hadoop batch ingestion to do merges on the mappers if possible	2015-07-20 12:01:53 -05:00
Himanshu Gupta	4ef484048a	take control of InputRow serde between Mapper/Reducer in Hadoop Indexing This allows for arbitrary InputFormat while hadoop batch ingestion that can return records of value type other than Text	2015-07-20 12:01:53 -05:00
Himanshu Gupta	f7a92db332	generic byte[] serde for InputRow	2015-07-20 12:01:53 -05:00
Charles Allen	b2bc46be17	Merge pull request #1484 from tubemogul/feature/1463 JobHelper.ensurePaths will set job properties from config (tuningConf…	2015-07-07 10:58:16 -07:00
Michael Schiff	6ad451a44a	JobHelper.ensurePaths will set job properties from config (tuningConfig.jobProperties) before adding input paths to the config. Adding input paths will create Path and FileSystem instances which may depend on the values in the job config. This allows all properties to be set from the spec file, avoiding having to directly edit cluster xml files. IndexGeneratorJob.run adds job properties before adding input paths (adding input paths may depend on having job properies set) JobHelperTest confirms that JobHelper.ensurePaths adds job properties javadoc for addInputPaths to explain relationship with addJobProperties	2015-07-01 12:45:32 -07:00
Davide Anastasia	4a3a7dd1ad	read hadoop-indexer configuration file from HDFS	2015-06-24 14:08:53 -07:00
Hao Xia	1931491c9f	A couple of hdfs related fixes * Class loading issue with hdfs-storage extension * Exception when using hdfs with non-fully qualified segment path	2015-06-19 17:22:20 -07:00
Charles Allen	94a567732a	Wipe FileContext off the face of the earth * Fixes https://github.com/druid-io/druid/issues/1433 * Works arround https://issues.apache.org/jira/browse/HADOOP-10643 * Reverts to the prior method of renaming	2015-06-16 09:48:09 -07:00
Charles Allen	6230ac90ae	Use IndexMerger for conversion	2015-06-10 11:34:58 -07:00
Charles Allen	056cab93ed	Add Hadoop Converter Job and task * Fixes https://github.com/druid-io/druid/issues/1363 * Add extra utils in JobHelper based on PR feedback	2015-06-09 14:47:38 -07:00
Charles Allen	2a76bdc60a	Abstractify hadoopy indexer configuration. * Moves many items to JobHelper * Remove dependencies of these functions on HadoopDruidIndexerConfig in favor of more general items * Changes functionalities of some of the path methods to always return a path with scheme * Adds retry to uploads * Change output loadSpec determining from using outputFS.getClass().getName() to using outputFS.getScheme()	2015-06-08 10:53:27 -07:00
fjy	be2a35188e	Additional schema validations and better logs for common extensions	2015-05-27 16:25:02 -07:00
Xavier Léauté	4466e77b25	Merge pull request #1371 from guobingkun/unit_test Unit test for IndexGeneratorJob	2015-05-22 10:34:24 -04:00
flow	07659f30ab	bug fix: hdfs task log and indexing task not work properly with Hadoop HA	2015-05-21 20:49:42 +08:00
Bingkun Guo	b46aff12ae	Unit test for IndexGeneratorJob	2015-05-18 12:31:16 -05:00
Fangjin Yang	a2dc58cd2d	Merge pull request #1345 from pjain1/unit_test_warn_fix fix warn msg and some unit tests	2015-05-08 08:06:20 -07:00
Parag Jain	01448d264c	Fix warn msg and added some unit tests	2015-05-07 17:10:05 -05:00
fjy	b19435d172	fix typos with batch ingestion in docs	2015-05-07 14:46:17 -07:00
Bingkun Guo	1ee550dd91	Fix a potential issue in DeterminePartitionsJob by making HadoopDruidIndexerConfig non-static, and two unit tests for DeterminPartitionsJob and LocalDataSegmentKiller	2015-05-04 20:00:29 -07:00
Xavier Léauté	3a3046ccf3	add support for dimension compression - compression for single-value dimensions using CompressedVSizeIntsIndexedSupplier - makes dimension compression configurable via IndexSpec - IndexSpec also enables configuring bitmap and metric compression	2015-04-14 10:44:18 -07:00
Prajwal Tuladhar	3044bf5592	use Job.getInstance() to fix deprecated warnings	2015-04-09 13:22:21 -04:00
Xavier Léauté	8b5fa8f85d	always upload SNAPSHOT self-contained jars	2015-04-03 21:18:09 -07:00
Dia Kharrat	3a6dc99384	log invalid rows in mapper of Hadoop indexer	2015-03-19 22:31:04 -07:00
Dia Kharrat	58d5f5e7f0	Honor ignoreInvalidRows in Hadoop indexer The reducer of the hadoop indexer now ignores lines with parsing exceptions (if enabled by the indexer config).	2015-03-19 22:31:04 -07:00
Himanshu Gupta	8c1f0834ba	Removing MapWritableInputRowParser from indexing-hadoop it should really be an extension if user needs	2015-03-19 18:37:08 -05:00
Himanshu Gupta	3f7a7ba5d3	For batch hadoop indexing, make hadoop input format configuration. Given input format must extend from org.apache.hadoop.mapreduce.InputFormat	2015-03-18 16:09:45 -05:00
fjy	bfe10bd156	This fixes arbitrary gran spec breaking	2015-03-17 12:19:43 -07:00
Himanshu Gupta	6a0405de20	fail early if there is no input data for batch hadoop indexing	2015-03-07 12:45:57 -06:00
Himanshu Gupta	30f64ff19e	UTs update for indexing-hadoop	2015-02-25 15:45:57 -08:00
Xavier Léauté	0784d7e30e	Merge pull request #1152 from himanshug/metastorage-pwd-provider support for metadata store PasswordProvider interface	2015-02-25 15:19:37 -08:00
Fangjin Yang	708f35151d	Merge pull request #1121 from gianm/issue-1116 Use the proper FileSystems for writing segments and caching jars. (for issue #1116)	2015-02-25 13:03:59 -08:00
Fangjin Yang	6424815f88	Merge pull request #1097 from metamx/better-hadoop-sort-key Sort HadoopIndexer rows by time+dim bucket to help reduce spilling	2015-02-25 12:49:58 -08:00
Himanshu Gupta	126262edce	support for PasswordProvider interface to enable writing druid extension which can get metadata store password from secured location or anywhere instead of plain text properties file	2015-02-25 14:05:19 -06:00
Himanshu Gupta	01a4f19ea2	removing dependency on NativeS3FileSystem and other file systems	2015-02-23 14:27:50 -06:00
Gian Merlino	fd5a7d1f08	Use the proper FileSystems for writing segments and caching jars. (for issue #1116 )	2015-02-12 16:20:10 -08:00
Xavier Léauté	b1ec7afc12	Sort HadoopIndexer rows by time+dim bucket to help reduce spilling	2015-02-10 14:26:28 -08:00
Fangjin Yang	92e616de11	Merge pull request #1077 from metamx/remove-unused-imports remove unused imports	2015-02-02 10:45:27 -08:00
nishantmonu51	ba932bb1f2	remove unused imports	2015-02-02 21:53:39 +05:30
fjy	d05032b98a	towards a community led druid	2015-01-31 20:57:36 -08:00
Xavier Léauté	cd9635ff5e	Merge pull request #1034 from druid-io/minor-rename minor rename of things in hadoop ingestion config to match 0.6.x	2015-01-15 15:46:13 -08:00
fjy	ccddbf8747	minor rename of things in hadoop ingestion config to match 0.6.x	2015-01-15 14:04:55 -08:00
Fangjin Yang	5bfcc43377	Merge pull request #1008 from metamx/stringConversionJavaUtilUpdate Update all String conversions to and from byte[] to use the java-util StringUtils functions	2015-01-15 13:50:27 -08:00
Fangjin Yang	852e863425	Merge pull request #981 from druid-io/strictModuleTyping Use Module instead of generic Object in Guice related items	2015-01-05 12:43:20 -08:00
Charles Allen	b1b5c9099e	Update all String conversions to and from byte[] to use the java-util StringUtils functions * Speedup of GroupBy with javaScript filters by ~10% * Requires https://github.com/metamx/java-util/pull/15	2015-01-05 11:22:32 -08:00
Xavier Léauté	f1375b0bfb	workaround to pass down bitmap type to map-reduce tasks	2015-01-02 17:29:00 -08:00
Charles Allen	7c8d4a7433	Use Module instead of generic Object in Guice related items	2014-12-19 10:54:06 -08:00
fjy	43d27ddaf0	update http client and fix logging	2014-12-15 16:59:57 -08:00
fjy	e872952390	fix working path default bug	2014-12-15 14:51:58 -08:00
fjy	28b72a69ad	redocumenting ingestion	2014-12-08 16:15:46 -08:00
nishantmonu51	40f223215a	fix buffer pool usage	2014-12-05 16:09:26 +05:30
nishantmonu51	6e03a6245f	Merge branch 'master' into onheap-incremental-index	2014-12-05 10:40:28 +05:30
Xavier Léauté	7cd45a6e1f	IncrementalIndex throws exception if limit exceeded - For now uses a hardcoded ratio of aggregator to timeanddim buffer sizes - canAppendRow is a workaround for realtime index since the Firehose currently does not have a way of rolling back the last event in case of error - canAppendRow needs a fudge factor; there is a race between checking if we can add a row and actually adding a row, because of the way MapDB reports its size.	2014-12-04 14:38:16 -08:00
Gian Merlino	20a7239ffd	Replace google-http-client imports with real guava imports.	2014-12-04 10:57:57 -08:00
Charles Allen	c2add5730b	Fix Hadoop CLI jobs * Change "schema" --> "spec" for cli hadoop to keep up with internal hadoop * Added check for HadoopDruidIndexerConfig deserialization from Map to see if it is trying to get a HadoopDruidIndexerConfig or a HadoopIngestionSpec	2014-12-04 10:57:56 -08:00
xvrl	c867d59ee0	fix error message	2014-12-03 15:30:32 -08:00
Xavier Léauté	2e6c254937	metadata injection not needed for indexing service	2014-12-03 15:09:31 -08:00
Gian Merlino	d388a8fe89	Replace google-http-client imports with real guava imports.	2014-12-03 10:52:57 -08:00
nishantmonu51	4dc0fdba8a	consider mapped size in limit calculation & review comments	2014-12-03 23:47:30 +05:30
nishantmonu51	da8bd7836b	Introduce buffer size	2014-12-03 16:28:22 +05:30
Charles Allen	7cd689be75	Fix Hadoop CLI jobs * Change "schema" --> "spec" for cli hadoop to keep up with internal hadoop * Added check for HadoopDruidIndexerConfig deserialization from Map to see if it is trying to get a HadoopDruidIndexerConfig or a HadoopIngestionSpec	2014-12-02 11:23:04 -08:00
nishantmonu51	eac776f1a7	tests passing with on heap incremental index	2014-12-02 22:29:28 +05:30
Xavier Léauté	59542c41f8	fix port not set in DruidNode	2014-12-01 14:37:28 -08:00
Charles Allen	8b3652a67a	Modify HadoopDruidIndexerConfig to give a port of 0 instead of -1 when binding DruidNode @Self annotation	2014-12-01 14:08:41 -08:00
fjy	fdeab0c6af	make Druid case sensitive	2014-11-19 14:27:31 -08:00
nishantmonu51	f0452c5968	merge from master	2014-11-18 19:34:51 +05:30
nishantmonu51	edf0fc0851	Make hashed partitions spec default - make hashed partitionsSpec as default partitions spec for 0.7	2014-11-17 19:48:12 +05:30
nishantmonu51	0c2d06475d	merge from master	2014-11-17 19:19:18 +05:30
Xavier Léauté	0498df25df	override metadata storage injection in CliHadoopIndexer	2014-11-07 13:44:56 -08:00
Xavier Léauté	50a191425c	fix injection on MetadataStorageUpdaterJob	2014-11-07 11:11:14 -08:00
Xavier Léauté	20a9aef96a	fix test	2014-11-06 17:27:05 -08:00
Xavier Léauté	9c06db021f	rename db->metadata postgres->postgresql	2014-10-31 10:30:27 -07:00
jisookim0513	aa754b86e8	build success!	2014-10-24 11:28:42 -07:00
fjy	bef74104d9	merge with 0.7.x and resolve any conflicts	2014-10-23 17:24:06 -07:00
fjy	d76d57d95d	update docs	2014-10-22 16:16:28 -07:00
jisookim0513	37979282fe	enabled ansi-quote in mysql; insert statement should now work	2014-10-21 00:09:19 -07:00
jisookim0513	7d5c5f2083	fixed createTable; fixed miscellaneous stuff; added DerbyMetadataRuleManagerProvider	2014-10-17 00:10:36 -07:00
nishantmonu51	41e88baeca	Add test for bucket selection	2014-10-15 23:09:28 +05:30
nishantmonu51	f4a97aebbc	fix rollup for hashed partitions truncate timestamp while calculating the partitionNumber	2014-10-15 22:32:56 +05:30
nishantmonu51	b5d66381f3	more cleanup	2014-10-14 18:32:40 +05:30
nishantmonu51	454acd3f5a	remove backwards compatible code 1) remove backwards compatible and deprecated code 2) make hashed partitions spec default	2014-10-13 19:30:44 +05:30
fjy	c7b4d5b7b4	Merge branch 'master' into druid-0.7.x Conflicts: processing/src/test/java/io/druid/segment/filter/SpatialFilterTest.java	2014-10-02 18:12:10 -07:00
nishantmonu51	ad75a21040	separate offheapIncrementalIndex implementation	2014-10-01 13:58:51 +05:30
jisookim0513	9d7b5d9b0f	fixed javadoc; fixed pom files; deleted unnecessary class	2014-09-30 13:47:35 -07:00
nishantmonu51	358ff915bb	fix merge conflicts	2014-09-30 22:19:18 +05:30
nishantmonu51	2789536bed	merge changes from druid-0.7.x	2014-09-30 22:05:49 +05:30

1 2 3 4 5 ...

270 Commits