Commit Graph

778 Commits

Author SHA1 Message Date
fjy 8f231fd3e3 cleanup druid codebase 2015-11-04 13:59:53 -08:00
Himanshu Gupta 84f7d8d264 making static final variables in HadoopDruidIndexerConfig upper case 2015-11-02 23:24:26 -06:00
Himanshu Gupta 8b67417ac8 make methods in Index[Merger,Maker,IO] non-static so that they can have
appropriate ObjectMapper injected instead of creating one statically
2015-11-02 23:24:26 -06:00
Himanshu Gupta aeffeaf3e2 fixing hadoop test scope dependencies in indexing-hadoop 2015-10-26 17:09:39 -05:00
Nishant 3641a0e553 Fix Race in jar upload during hadoop indexing - https://github.com/druid-io/druid/issues/582
few fixes

delete intermediate file early

better exception handling

use static pattern instead of compiling it every time

Add retry for transient exceptions

remove usage of deprecated method.

Add test

fix imports

fix javadoc

review comment.

review comment: handle crazy snapshot naming

review comments

remove default retry count in favour of already present constant

review comment

make random intermediate and final paths.

review comment, use temporaryFolder where possible
2015-10-22 21:41:07 +05:30
Xavier Léauté e4ac78e43d bump next snapshot to 0.9.0 2015-10-20 13:46:13 -07:00
Xavier Léauté 4c2c7a2c37 update version to 0.8.3 2015-10-14 21:40:55 -07:00
Himanshu Gupta 0368260018 For dataSource inputSpec in hadoop batch ingestion, use configured query granularity for reading existing segments instead of NONE 2015-10-12 22:19:44 -05:00
Gian Merlino 3aba401ee0 SQLMetadataConnector: Retry table creation, in case something goes wrong.
Also rejigger table creation methods to not take a DBI. It's already available
inside the connector, and everyone was just using that one anyway.
2015-09-24 21:39:36 -07:00
Himanshu Gupta e8b9ee85a7 HadoopyStringInputRowParser to convert stringy Text, BytesWritable etc into InputRow 2015-09-16 10:58:13 -05:00
Himanshu Gupta 74f4572bd4 Lazily deserialize "parser" to InputRowParser in DataSchema
so that user hadoop related InputRowParsers are created only when needed
this allows overlord to accept a HadoopIndexTask with a hadoopy InputRowParser
and not fail because hadoopy InputRowParser might need hadoop libraries
2015-09-16 10:58:13 -05:00
Himanshu Gupta 9ca6106128 user specified hadoop settings are ignored if explicitly set in code 2015-08-31 10:50:18 -05:00
Gian Merlino 940e1aa3eb Replace funky imports with standard ones.
1) Lots of Guava imports were not coming from the actual Guava
2) junit.framework.Assert should be org.junit.Assert
2015-08-28 18:02:05 -07:00
jon-wei e5c4927b14 Add support for parsing BytesWritable strings to Hadoop Indexer 2015-08-28 14:27:14 -07:00
Gian Merlino 414a6fb477 Fix overlapping segments in IngestSegmentFirehose, DatasourceInputFormat.
Fixes #1678. IngestSegmentFirehose (and its users) need to remember which
windows of which segments should actually be read, based on a timeline.
2015-08-28 07:32:41 -07:00
Himanshu Gupta 2e0dd1d792 adding UTs and addressing review comments to
firehoseV2 addition to Realtime[Manager|Plumber],
essential segment metadata persist support,
kafka-simple-consumer-firehose extension patch
2015-08-27 20:50:46 -05:00
lvjq 2237a8cf0f kafka 8 simple consumer firehose 2015-08-27 20:50:46 -05:00
Charles Allen e38cf54bc8 Migrate TestDerbyConnector to a JUnit @Rule 2015-08-26 21:47:40 -07:00
Himanshu Gupta b3c570e78d update BatchDeltaIngestion.testDeltaIngestion(..) to check for proper glob path handling 2015-08-20 21:36:34 -05:00
Himanshu Gupta 85e3ce9096 split hadoop glob path before adding it to MultipleInputs
This can be safely reverted once https://issues.apache.org/jira/browse/MAPREDUCE-5061 is fixed
2015-08-20 21:36:34 -05:00
Himanshu Gupta a603bd9547 HadoopGlobPathSplitter implementation to split hadoop glob paths
This can be safely reverted once https://issues.apache.org/jira/browse/MAPREDUCE-5061 is fixed
2015-08-20 21:36:34 -05:00
Himanshu Gupta cf3ec8eb46 helpful cause explaining why SegmentDescriptorInfo did not exist 2015-08-19 10:29:04 -05:00
Xavier Léauté 3b2e41e42a update for next release 2015-08-18 17:16:46 -07:00
Himanshu Gupta a3bab5b7d9 IndexGeneratorJobTest type unit test for batch delta ingestion and reindexing 2015-08-16 14:07:35 -05:00
Himanshu Gupta 15fa43dd43 changing DatasourcePathSpec, to get segment list, so that hadoop indexer uses overlord action to get list of segments and passes when running as an overlord task. and, uses metadata store directly when running as standalone hadoop indexer
also, serialized list of segments is passed to DatasourcePathSpec so that hadoop classloader issues do not creep up
2015-08-16 14:07:35 -05:00
Himanshu Gupta 45947a1021 add ability to specify Multiple PathSpecs in batch ingestion, so that we can grab data from multiple places in same ingestion
Conflicts:
	indexing-hadoop/src/main/java/io/druid/indexer/HadoopDruidIndexerConfig.java
	indexing-hadoop/src/main/java/io/druid/indexer/JobHelper.java

Conflicts:
	indexing-hadoop/src/main/java/io/druid/indexer/path/PathSpec.java
2015-08-16 13:15:38 -05:00
Himanshu Gupta 1ae56f139b Druid Hadoop InputFormat and pathSpec
Conflicts:
	indexing-hadoop/src/main/java/io/druid/indexer/path/PathSpec.java
	indexing-service/pom.xml
2015-08-16 13:15:38 -05:00
Himanshu Gupta f1d309a671 do not run parser if value from InputFormat is already an InputRow 2015-08-14 14:44:22 -05:00
Himanshu Gupta 0eec1bbee2 json serde tests for HadoopTuningConfig 2015-07-20 12:01:53 -05:00
Himanshu Gupta f836c3a7ac adding flag useCombiner to hadoop tuning config that can be used to add a
hadoop combiner to hadoop batch ingestion to do merges on the mappers if possible
2015-07-20 12:01:53 -05:00
Himanshu Gupta 4ef484048a take control of InputRow serde between Mapper/Reducer in Hadoop Indexing
This allows for arbitrary InputFormat while hadoop batch ingestion that
can return records of value type other than Text
2015-07-20 12:01:53 -05:00
Himanshu Gupta f7a92db332 generic byte[] serde for InputRow 2015-07-20 12:01:53 -05:00
Xavier Léauté 4cfb00bc8a inrement version 2015-07-15 13:09:05 -07:00
Charles Allen b2bc46be17 Merge pull request #1484 from tubemogul/feature/1463
JobHelper.ensurePaths will set job properties from config (tuningConf…
2015-07-07 10:58:16 -07:00
Michael Schiff 6ad451a44a JobHelper.ensurePaths will set job properties from config (tuningConfig.jobProperties) before adding input paths to the config.
Adding input paths will create Path and FileSystem instances which may depend on the values in the job config.
This allows all properties to be set from the spec file, avoiding having to directly edit cluster xml files.

IndexGeneratorJob.run adds job properties before adding input paths (adding input paths may depend on having job properies set)

JobHelperTest confirms that JobHelper.ensurePaths adds job properties

javadoc for addInputPaths to explain relationship with
addJobProperties
2015-07-01 12:45:32 -07:00
Davide Anastasia 4a3a7dd1ad read hadoop-indexer configuration file from HDFS 2015-06-24 14:08:53 -07:00
Hao Xia 1931491c9f A couple of hdfs related fixes
* Class loading issue with hdfs-storage extension
* Exception when using hdfs with non-fully qualified segment path
2015-06-19 17:22:20 -07:00
Xavier Léauté 0a5bb909a2 [maven-release-plugin] prepare for next development iteration 2015-06-18 17:35:19 -07:00
Xavier Léauté 59c6b2b279 [maven-release-plugin] prepare release druid-0.8.0-rc1 2015-06-18 17:35:14 -07:00
Charles Allen 94a567732a Wipe FileContext off the face of the earth
* Fixes https://github.com/druid-io/druid/issues/1433
* Works arround https://issues.apache.org/jira/browse/HADOOP-10643
* Reverts to the prior method of renaming
2015-06-16 09:48:09 -07:00
Charles Allen 6230ac90ae Use IndexMerger for conversion 2015-06-10 11:34:58 -07:00
Charles Allen 056cab93ed Add Hadoop Converter Job and task
* Fixes https://github.com/druid-io/druid/issues/1363
* Add extra utils in JobHelper based on PR feedback
2015-06-09 14:47:38 -07:00
Charles Allen 2a76bdc60a Abstractify hadoopy indexer configuration.
* Moves many items to JobHelper
* Remove dependencies of these functions on HadoopDruidIndexerConfig in favor of more general items
* Changes functionalities of some of the path methods to always return a path with scheme
* Adds retry to uploads
* Change output loadSpec determining from using outputFS.getClass().getName() to using outputFS.getScheme()
2015-06-08 10:53:27 -07:00
fjy be2a35188e Additional schema validations and better logs for common extensions 2015-05-27 16:25:02 -07:00
Xavier Léauté 4466e77b25 Merge pull request #1371 from guobingkun/unit_test
Unit test for IndexGeneratorJob
2015-05-22 10:34:24 -04:00
flow 07659f30ab bug fix: hdfs task log and indexing task not work properly with Hadoop HA 2015-05-21 20:49:42 +08:00
Bingkun Guo b46aff12ae Unit test for IndexGeneratorJob 2015-05-18 12:31:16 -05:00
fjy 7a6acf5c1b update pom to 0.8 2015-05-11 19:41:58 -06:00
Fangjin Yang a2dc58cd2d Merge pull request #1345 from pjain1/unit_test_warn_fix
fix warn msg and some unit tests
2015-05-08 08:06:20 -07:00
Parag Jain 01448d264c Fix warn msg and added some unit tests 2015-05-07 17:10:05 -05:00
fjy b19435d172 fix typos with batch ingestion in docs 2015-05-07 14:46:17 -07:00
Bingkun Guo 1ee550dd91 Fix a potential issue in DeterminePartitionsJob by making HadoopDruidIndexerConfig non-static, and two unit tests for DeterminPartitionsJob and LocalDataSegmentKiller 2015-05-04 20:00:29 -07:00
Xavier Léauté 3a3046ccf3 add support for dimension compression
- compression for single-value dimensions using CompressedVSizeIntsIndexedSupplier
- makes dimension compression configurable via IndexSpec
- IndexSpec also enables configuring bitmap and metric compression
2015-04-14 10:44:18 -07:00
Prajwal Tuladhar 3044bf5592 use Job.getInstance() to fix deprecated warnings 2015-04-09 13:22:21 -04:00
Xavier Léauté 8b5fa8f85d always upload SNAPSHOT self-contained jars 2015-04-03 21:18:09 -07:00
fjy aea7f9d192 [maven-release-plugin] prepare for next development iteration 2015-03-30 16:35:24 -07:00
fjy 060d7aef03 [maven-release-plugin] prepare release druid-0.7.1 2015-03-30 16:35:20 -07:00
Dia Kharrat 3a6dc99384 log invalid rows in mapper of Hadoop indexer 2015-03-19 22:31:04 -07:00
Dia Kharrat 58d5f5e7f0 Honor ignoreInvalidRows in Hadoop indexer
The reducer of the hadoop indexer now ignores lines with parsing
exceptions (if enabled by the indexer config).
2015-03-19 22:31:04 -07:00
Himanshu Gupta 8c1f0834ba Removing MapWritableInputRowParser from indexing-hadoop it should really be an extension if user needs 2015-03-19 18:37:08 -05:00
Xavier Léauté a98187f798 Merge pull request #1177 from himanshug/custom_input_format1
Feature:  Make hadoop input format configurable for batch ingestion
2015-03-19 15:49:36 -07:00
fjy b389cfe404 [maven-release-plugin] prepare for next development iteration 2015-03-19 12:38:17 -07:00
fjy 60e7d543cc [maven-release-plugin] prepare release druid-0.7.1-rc1 2015-03-19 12:38:13 -07:00
Himanshu Gupta 3f7a7ba5d3 For batch hadoop indexing, make hadoop input format configuration. Given input format must extend from org.apache.hadoop.mapreduce.InputFormat 2015-03-18 16:09:45 -05:00
fjy bfe10bd156 This fixes arbitrary gran spec breaking 2015-03-17 12:19:43 -07:00
Himanshu Gupta 6a0405de20 fail early if there is no input data for batch hadoop indexing 2015-03-07 12:45:57 -06:00
Himanshu Gupta 30f64ff19e UTs update for indexing-hadoop 2015-02-25 15:45:57 -08:00
Xavier Léauté 0784d7e30e Merge pull request #1152 from himanshug/metastorage-pwd-provider
support for metadata store PasswordProvider interface
2015-02-25 15:19:37 -08:00
Fangjin Yang 708f35151d Merge pull request #1121 from gianm/issue-1116
Use the proper FileSystems for writing segments and caching jars. (for issue #1116)
2015-02-25 13:03:59 -08:00
Fangjin Yang 6424815f88 Merge pull request #1097 from metamx/better-hadoop-sort-key
Sort HadoopIndexer rows by time+dim bucket to help reduce spilling
2015-02-25 12:49:58 -08:00
Fangjin Yang 3d50a3771a Merge pull request #1151 from himanshug/remove-s3-fs-dep
removing dependency on NativeS3FileSystem and other file systems
2015-02-25 12:31:45 -08:00
Himanshu Gupta 126262edce support for PasswordProvider interface to enable writing druid extension which can get metadata store password from secured location or anywhere instead of plain text properties file 2015-02-25 14:05:19 -06:00
Xavier Léauté b167dcf82c [maven-release-plugin] prepare for next development iteration 2015-02-23 14:28:06 -08:00
Xavier Léauté e81ac2ba43 [maven-release-plugin] prepare release druid-0.7.0 2015-02-23 14:27:58 -08:00
Himanshu Gupta 01a4f19ea2 removing dependency on NativeS3FileSystem and other file systems 2015-02-23 14:27:50 -06:00
Xavier Léauté 78df7f6165 Move Druid release artifacts to Sonatype
- Switch to using Druid parent POM
- Add required fields for Sonatype
- Common plugin versions and settings have been moved to the parent pom
- Cleanup artifacts and POMs for consistent formatting
- Remove org.hyperic.sigar dependency and update docs to reflect necessary jars to add at runtime when sigar is needed
2015-02-13 14:26:31 -08:00
Gian Merlino fd5a7d1f08 Use the proper FileSystems for writing segments and caching jars. (for issue #1116) 2015-02-12 16:20:10 -08:00
fjy d29740ed9f [maven-release-plugin] prepare for next development iteration 2015-02-12 16:16:00 -08:00
fjy 211fd15b7e [maven-release-plugin] prepare release druid-0.7.0-rc3 2015-02-12 16:15:56 -08:00
Xavier Léauté b1ec7afc12 Sort HadoopIndexer rows by time+dim bucket to help reduce spilling 2015-02-10 14:26:28 -08:00
fjy 1f12c5b2f1 [maven-release-plugin] prepare for next development iteration 2015-02-03 12:06:49 -08:00
fjy e82d431be7 [maven-release-plugin] prepare release druid-0.7.0-rc2 2015-02-03 12:06:41 -08:00
Fangjin Yang 92e616de11 Merge pull request #1077 from metamx/remove-unused-imports
remove unused imports
2015-02-02 10:45:27 -08:00
nishantmonu51 ba932bb1f2 remove unused imports 2015-02-02 21:53:39 +05:30
fjy d05032b98a towards a community led druid 2015-01-31 20:57:36 -08:00
fjy 1f94de22c6 [maven-release-plugin] prepare for next development iteration 2015-01-20 14:23:55 -08:00
fjy 17476edc31 [maven-release-plugin] prepare release druid-0.7.0-rc1 2015-01-20 14:23:51 -08:00
Xavier Léauté cd9635ff5e Merge pull request #1034 from druid-io/minor-rename
minor rename of things in hadoop ingestion config to match 0.6.x
2015-01-15 15:46:13 -08:00
fjy ccddbf8747 minor rename of things in hadoop ingestion config to match 0.6.x 2015-01-15 14:04:55 -08:00
Fangjin Yang 5bfcc43377 Merge pull request #1008 from metamx/stringConversionJavaUtilUpdate
Update all String conversions to and from byte[] to use the java-util StringUtils functions
2015-01-15 13:50:27 -08:00
Fangjin Yang 852e863425 Merge pull request #981 from druid-io/strictModuleTyping
Use Module instead of generic Object in Guice related items
2015-01-05 12:43:20 -08:00
Charles Allen b1b5c9099e Update all String conversions to and from byte[] to use the java-util StringUtils functions
* Speedup of GroupBy with javaScript filters by ~10%
* Requires https://github.com/metamx/java-util/pull/15
2015-01-05 11:22:32 -08:00
Xavier Léauté f1375b0bfb workaround to pass down bitmap type to map-reduce tasks 2015-01-02 17:29:00 -08:00
Charles Allen 7c8d4a7433 Use Module instead of generic Object in Guice related items 2014-12-19 10:54:06 -08:00
fjy 43d27ddaf0 update http client and fix logging 2014-12-15 16:59:57 -08:00
fjy e872952390 fix working path default bug 2014-12-15 14:51:58 -08:00
fjy 28b72a69ad redocumenting ingestion 2014-12-08 16:15:46 -08:00
nishantmonu51 40f223215a fix buffer pool usage 2014-12-05 16:09:26 +05:30
nishantmonu51 6e03a6245f Merge branch 'master' into onheap-incremental-index 2014-12-05 10:40:28 +05:30
Xavier Léauté 7cd45a6e1f IncrementalIndex throws exception if limit exceeded
- For now uses a hardcoded ratio of aggregator to timeanddim buffer sizes
- canAppendRow is a workaround for realtime index since the
Firehose currently does not have a way of rolling back the last event in
case of error
- canAppendRow needs a fudge factor; there is a race between checking
if we can add a row and actually adding a row, because of the way MapDB
reports its size.
2014-12-04 14:38:16 -08:00