binlijin
219367221b
optimize InputRowSerde
2015-12-09 09:51:56 +08:00
Fangjin Yang
d957a6602c
Merge pull request #2049 from himanshug/hadoop_indexing_unique_path
...
add a unique string to intermediate path for the hadoop indexing task
2015-12-07 11:46:16 -08:00
Himanshu Gupta
6cfaf59d7e
add a unique string to intermediate path for the hadoop indexing task
2015-12-06 22:20:38 -06:00
Himanshu Gupta
62ba9ade37
unifying license header in all java files
2015-12-05 22:16:23 -06:00
Himanshu Gupta
61aaa09012
support multiple intervals in dataSource input spec
2015-12-03 21:28:04 -06:00
Fangjin Yang
21c84b5ff7
Merge pull request #1896 from gianm/allocate-segment
...
SegmentAllocateAction (fixes #1515 )
2015-11-18 21:05:46 -08:00
Gian Merlino
e4e5f0375b
SegmentAllocateAction ( fixes #1515 )
...
This is a feature meant to allow realtime tasks to work without being told upfront
what shardSpec they should use (so we can potentially publish a variable number
of segments per interval).
The idea is that there is a "pendingSegments" table in the metadata store that
tracks allocated segments. Each one has a segment id (the same segment id we know
and love) and is also part of a sequence.
The sequences are an idea from @cheddar that offers a way of doing replication.
If there are N tasks reading exactly the same data with exactly the same logic
(think Kafka tasks reading a fixed range of offsets) then you can place them
in the same sequence, and they will generate the same sequence of segments.
2015-11-11 16:54:35 -08:00
Xavier Léauté
fa6142e217
cleanup and remove unused imports
2015-11-11 12:25:21 -08:00
Charles Allen
abae47850a
Add backwards compatability for PR #1922
2015-11-11 10:27:00 -08:00
Gian Merlino
dfbd0e2b60
Merge pull request #1925 from gianm/fix-index-generator
...
Fix reference to INDEX_MAKER in IndexGeneratorJob.
2015-11-06 09:56:30 -08:00
Gian Merlino
75122dc396
Fix reference to INDEX_MAKER in IndexGeneratorJob.
2015-11-06 09:19:58 -08:00
Himanshu Gupta
6bed633121
do not use LoggingProcessIndicator in IndexGeneratorJob because that uses Stopwatch methods from guava not available in older guava versions, this makes the behavior same as LegacyIndexGeneratorJob
2015-11-06 00:40:51 -06:00
Charles Allen
929b981710
Change DefaultObjectMapper to NOT overwrite final fields unless explicitly asked to
2015-11-05 18:10:13 -08:00
Xavier Léauté
223d1ebe9f
fix a very old todo
2015-11-05 13:00:30 -08:00
fjy
8f231fd3e3
cleanup druid codebase
2015-11-04 13:59:53 -08:00
Himanshu Gupta
84f7d8d264
making static final variables in HadoopDruidIndexerConfig upper case
2015-11-02 23:24:26 -06:00
Himanshu Gupta
8b67417ac8
make methods in Index[Merger,Maker,IO] non-static so that they can have
...
appropriate ObjectMapper injected instead of creating one statically
2015-11-02 23:24:26 -06:00
Himanshu Gupta
aeffeaf3e2
fixing hadoop test scope dependencies in indexing-hadoop
2015-10-26 17:09:39 -05:00
Nishant
3641a0e553
Fix Race in jar upload during hadoop indexing - https://github.com/druid-io/druid/issues/582
...
few fixes
delete intermediate file early
better exception handling
use static pattern instead of compiling it every time
Add retry for transient exceptions
remove usage of deprecated method.
Add test
fix imports
fix javadoc
review comment.
review comment: handle crazy snapshot naming
review comments
remove default retry count in favour of already present constant
review comment
make random intermediate and final paths.
review comment, use temporaryFolder where possible
2015-10-22 21:41:07 +05:30
Xavier Léauté
e4ac78e43d
bump next snapshot to 0.9.0
2015-10-20 13:46:13 -07:00
Xavier Léauté
4c2c7a2c37
update version to 0.8.3
2015-10-14 21:40:55 -07:00
Himanshu Gupta
0368260018
For dataSource inputSpec in hadoop batch ingestion, use configured query granularity for reading existing segments instead of NONE
2015-10-12 22:19:44 -05:00
Gian Merlino
3aba401ee0
SQLMetadataConnector: Retry table creation, in case something goes wrong.
...
Also rejigger table creation methods to not take a DBI. It's already available
inside the connector, and everyone was just using that one anyway.
2015-09-24 21:39:36 -07:00
Himanshu Gupta
e8b9ee85a7
HadoopyStringInputRowParser to convert stringy Text, BytesWritable etc into InputRow
2015-09-16 10:58:13 -05:00
Himanshu Gupta
74f4572bd4
Lazily deserialize "parser" to InputRowParser in DataSchema
...
so that user hadoop related InputRowParsers are created only when needed
this allows overlord to accept a HadoopIndexTask with a hadoopy InputRowParser
and not fail because hadoopy InputRowParser might need hadoop libraries
2015-09-16 10:58:13 -05:00
Himanshu Gupta
9ca6106128
user specified hadoop settings are ignored if explicitly set in code
2015-08-31 10:50:18 -05:00
Gian Merlino
940e1aa3eb
Replace funky imports with standard ones.
...
1) Lots of Guava imports were not coming from the actual Guava
2) junit.framework.Assert should be org.junit.Assert
2015-08-28 18:02:05 -07:00
jon-wei
e5c4927b14
Add support for parsing BytesWritable strings to Hadoop Indexer
2015-08-28 14:27:14 -07:00
Gian Merlino
414a6fb477
Fix overlapping segments in IngestSegmentFirehose, DatasourceInputFormat.
...
Fixes #1678 . IngestSegmentFirehose (and its users) need to remember which
windows of which segments should actually be read, based on a timeline.
2015-08-28 07:32:41 -07:00
Himanshu Gupta
2e0dd1d792
adding UTs and addressing review comments to
...
firehoseV2 addition to Realtime[Manager|Plumber],
essential segment metadata persist support,
kafka-simple-consumer-firehose extension patch
2015-08-27 20:50:46 -05:00
lvjq
2237a8cf0f
kafka 8 simple consumer firehose
2015-08-27 20:50:46 -05:00
Charles Allen
e38cf54bc8
Migrate TestDerbyConnector to a JUnit @Rule
2015-08-26 21:47:40 -07:00
Himanshu Gupta
b3c570e78d
update BatchDeltaIngestion.testDeltaIngestion(..) to check for proper glob path handling
2015-08-20 21:36:34 -05:00
Himanshu Gupta
85e3ce9096
split hadoop glob path before adding it to MultipleInputs
...
This can be safely reverted once https://issues.apache.org/jira/browse/MAPREDUCE-5061 is fixed
2015-08-20 21:36:34 -05:00
Himanshu Gupta
a603bd9547
HadoopGlobPathSplitter implementation to split hadoop glob paths
...
This can be safely reverted once https://issues.apache.org/jira/browse/MAPREDUCE-5061 is fixed
2015-08-20 21:36:34 -05:00
Himanshu Gupta
cf3ec8eb46
helpful cause explaining why SegmentDescriptorInfo did not exist
2015-08-19 10:29:04 -05:00
Xavier Léauté
3b2e41e42a
update for next release
2015-08-18 17:16:46 -07:00
Himanshu Gupta
a3bab5b7d9
IndexGeneratorJobTest type unit test for batch delta ingestion and reindexing
2015-08-16 14:07:35 -05:00
Himanshu Gupta
15fa43dd43
changing DatasourcePathSpec, to get segment list, so that hadoop indexer uses overlord action to get list of segments and passes when running as an overlord task. and, uses metadata store directly when running as standalone hadoop indexer
...
also, serialized list of segments is passed to DatasourcePathSpec so that hadoop classloader issues do not creep up
2015-08-16 14:07:35 -05:00
Himanshu Gupta
45947a1021
add ability to specify Multiple PathSpecs in batch ingestion, so that we can grab data from multiple places in same ingestion
...
Conflicts:
indexing-hadoop/src/main/java/io/druid/indexer/HadoopDruidIndexerConfig.java
indexing-hadoop/src/main/java/io/druid/indexer/JobHelper.java
Conflicts:
indexing-hadoop/src/main/java/io/druid/indexer/path/PathSpec.java
2015-08-16 13:15:38 -05:00
Himanshu Gupta
1ae56f139b
Druid Hadoop InputFormat and pathSpec
...
Conflicts:
indexing-hadoop/src/main/java/io/druid/indexer/path/PathSpec.java
indexing-service/pom.xml
2015-08-16 13:15:38 -05:00
Himanshu Gupta
f1d309a671
do not run parser if value from InputFormat is already an InputRow
2015-08-14 14:44:22 -05:00
Himanshu Gupta
0eec1bbee2
json serde tests for HadoopTuningConfig
2015-07-20 12:01:53 -05:00
Himanshu Gupta
f836c3a7ac
adding flag useCombiner to hadoop tuning config that can be used to add a
...
hadoop combiner to hadoop batch ingestion to do merges on the mappers if possible
2015-07-20 12:01:53 -05:00
Himanshu Gupta
4ef484048a
take control of InputRow serde between Mapper/Reducer in Hadoop Indexing
...
This allows for arbitrary InputFormat while hadoop batch ingestion that
can return records of value type other than Text
2015-07-20 12:01:53 -05:00
Himanshu Gupta
f7a92db332
generic byte[] serde for InputRow
2015-07-20 12:01:53 -05:00
Xavier Léauté
4cfb00bc8a
inrement version
2015-07-15 13:09:05 -07:00
Charles Allen
b2bc46be17
Merge pull request #1484 from tubemogul/feature/1463
...
JobHelper.ensurePaths will set job properties from config (tuningConf…
2015-07-07 10:58:16 -07:00
Michael Schiff
6ad451a44a
JobHelper.ensurePaths will set job properties from config (tuningConfig.jobProperties) before adding input paths to the config.
...
Adding input paths will create Path and FileSystem instances which may depend on the values in the job config.
This allows all properties to be set from the spec file, avoiding having to directly edit cluster xml files.
IndexGeneratorJob.run adds job properties before adding input paths (adding input paths may depend on having job properies set)
JobHelperTest confirms that JobHelper.ensurePaths adds job properties
javadoc for addInputPaths to explain relationship with
addJobProperties
2015-07-01 12:45:32 -07:00
Davide Anastasia
4a3a7dd1ad
read hadoop-indexer configuration file from HDFS
2015-06-24 14:08:53 -07:00
Hao Xia
1931491c9f
A couple of hdfs related fixes
...
* Class loading issue with hdfs-storage extension
* Exception when using hdfs with non-fully qualified segment path
2015-06-19 17:22:20 -07:00
Xavier Léauté
0a5bb909a2
[maven-release-plugin] prepare for next development iteration
2015-06-18 17:35:19 -07:00
Xavier Léauté
59c6b2b279
[maven-release-plugin] prepare release druid-0.8.0-rc1
2015-06-18 17:35:14 -07:00
Charles Allen
94a567732a
Wipe FileContext off the face of the earth
...
* Fixes https://github.com/druid-io/druid/issues/1433
* Works arround https://issues.apache.org/jira/browse/HADOOP-10643
* Reverts to the prior method of renaming
2015-06-16 09:48:09 -07:00
Charles Allen
6230ac90ae
Use IndexMerger for conversion
2015-06-10 11:34:58 -07:00
Charles Allen
056cab93ed
Add Hadoop Converter Job and task
...
* Fixes https://github.com/druid-io/druid/issues/1363
* Add extra utils in JobHelper based on PR feedback
2015-06-09 14:47:38 -07:00
Charles Allen
2a76bdc60a
Abstractify hadoopy indexer configuration.
...
* Moves many items to JobHelper
* Remove dependencies of these functions on HadoopDruidIndexerConfig in favor of more general items
* Changes functionalities of some of the path methods to always return a path with scheme
* Adds retry to uploads
* Change output loadSpec determining from using outputFS.getClass().getName() to using outputFS.getScheme()
2015-06-08 10:53:27 -07:00
fjy
be2a35188e
Additional schema validations and better logs for common extensions
2015-05-27 16:25:02 -07:00
Xavier Léauté
4466e77b25
Merge pull request #1371 from guobingkun/unit_test
...
Unit test for IndexGeneratorJob
2015-05-22 10:34:24 -04:00
flow
07659f30ab
bug fix: hdfs task log and indexing task not work properly with Hadoop HA
2015-05-21 20:49:42 +08:00
Bingkun Guo
b46aff12ae
Unit test for IndexGeneratorJob
2015-05-18 12:31:16 -05:00
fjy
7a6acf5c1b
update pom to 0.8
2015-05-11 19:41:58 -06:00
Fangjin Yang
a2dc58cd2d
Merge pull request #1345 from pjain1/unit_test_warn_fix
...
fix warn msg and some unit tests
2015-05-08 08:06:20 -07:00
Parag Jain
01448d264c
Fix warn msg and added some unit tests
2015-05-07 17:10:05 -05:00
fjy
b19435d172
fix typos with batch ingestion in docs
2015-05-07 14:46:17 -07:00
Bingkun Guo
1ee550dd91
Fix a potential issue in DeterminePartitionsJob by making HadoopDruidIndexerConfig non-static, and two unit tests for DeterminPartitionsJob and LocalDataSegmentKiller
2015-05-04 20:00:29 -07:00
Xavier Léauté
3a3046ccf3
add support for dimension compression
...
- compression for single-value dimensions using CompressedVSizeIntsIndexedSupplier
- makes dimension compression configurable via IndexSpec
- IndexSpec also enables configuring bitmap and metric compression
2015-04-14 10:44:18 -07:00
Prajwal Tuladhar
3044bf5592
use Job.getInstance() to fix deprecated warnings
2015-04-09 13:22:21 -04:00
Xavier Léauté
8b5fa8f85d
always upload SNAPSHOT self-contained jars
2015-04-03 21:18:09 -07:00
fjy
aea7f9d192
[maven-release-plugin] prepare for next development iteration
2015-03-30 16:35:24 -07:00
fjy
060d7aef03
[maven-release-plugin] prepare release druid-0.7.1
2015-03-30 16:35:20 -07:00
Dia Kharrat
3a6dc99384
log invalid rows in mapper of Hadoop indexer
2015-03-19 22:31:04 -07:00
Dia Kharrat
58d5f5e7f0
Honor ignoreInvalidRows in Hadoop indexer
...
The reducer of the hadoop indexer now ignores lines with parsing
exceptions (if enabled by the indexer config).
2015-03-19 22:31:04 -07:00
Himanshu Gupta
8c1f0834ba
Removing MapWritableInputRowParser from indexing-hadoop it should really be an extension if user needs
2015-03-19 18:37:08 -05:00
Xavier Léauté
a98187f798
Merge pull request #1177 from himanshug/custom_input_format1
...
Feature: Make hadoop input format configurable for batch ingestion
2015-03-19 15:49:36 -07:00
fjy
b389cfe404
[maven-release-plugin] prepare for next development iteration
2015-03-19 12:38:17 -07:00
fjy
60e7d543cc
[maven-release-plugin] prepare release druid-0.7.1-rc1
2015-03-19 12:38:13 -07:00
Himanshu Gupta
3f7a7ba5d3
For batch hadoop indexing, make hadoop input format configuration. Given input format must extend from org.apache.hadoop.mapreduce.InputFormat
2015-03-18 16:09:45 -05:00
fjy
bfe10bd156
This fixes arbitrary gran spec breaking
2015-03-17 12:19:43 -07:00
Himanshu Gupta
6a0405de20
fail early if there is no input data for batch hadoop indexing
2015-03-07 12:45:57 -06:00
Himanshu Gupta
30f64ff19e
UTs update for indexing-hadoop
2015-02-25 15:45:57 -08:00
Xavier Léauté
0784d7e30e
Merge pull request #1152 from himanshug/metastorage-pwd-provider
...
support for metadata store PasswordProvider interface
2015-02-25 15:19:37 -08:00
Fangjin Yang
708f35151d
Merge pull request #1121 from gianm/issue-1116
...
Use the proper FileSystems for writing segments and caching jars. (for issue #1116 )
2015-02-25 13:03:59 -08:00
Fangjin Yang
6424815f88
Merge pull request #1097 from metamx/better-hadoop-sort-key
...
Sort HadoopIndexer rows by time+dim bucket to help reduce spilling
2015-02-25 12:49:58 -08:00
Fangjin Yang
3d50a3771a
Merge pull request #1151 from himanshug/remove-s3-fs-dep
...
removing dependency on NativeS3FileSystem and other file systems
2015-02-25 12:31:45 -08:00
Himanshu Gupta
126262edce
support for PasswordProvider interface to enable writing druid extension which can get metadata store password from secured location or anywhere instead of plain text properties file
2015-02-25 14:05:19 -06:00
Xavier Léauté
b167dcf82c
[maven-release-plugin] prepare for next development iteration
2015-02-23 14:28:06 -08:00
Xavier Léauté
e81ac2ba43
[maven-release-plugin] prepare release druid-0.7.0
2015-02-23 14:27:58 -08:00
Himanshu Gupta
01a4f19ea2
removing dependency on NativeS3FileSystem and other file systems
2015-02-23 14:27:50 -06:00
Xavier Léauté
78df7f6165
Move Druid release artifacts to Sonatype
...
- Switch to using Druid parent POM
- Add required fields for Sonatype
- Common plugin versions and settings have been moved to the parent pom
- Cleanup artifacts and POMs for consistent formatting
- Remove org.hyperic.sigar dependency and update docs to reflect necessary jars to add at runtime when sigar is needed
2015-02-13 14:26:31 -08:00
Gian Merlino
fd5a7d1f08
Use the proper FileSystems for writing segments and caching jars. (for issue #1116 )
2015-02-12 16:20:10 -08:00
fjy
d29740ed9f
[maven-release-plugin] prepare for next development iteration
2015-02-12 16:16:00 -08:00
fjy
211fd15b7e
[maven-release-plugin] prepare release druid-0.7.0-rc3
2015-02-12 16:15:56 -08:00
Xavier Léauté
b1ec7afc12
Sort HadoopIndexer rows by time+dim bucket to help reduce spilling
2015-02-10 14:26:28 -08:00
fjy
1f12c5b2f1
[maven-release-plugin] prepare for next development iteration
2015-02-03 12:06:49 -08:00
fjy
e82d431be7
[maven-release-plugin] prepare release druid-0.7.0-rc2
2015-02-03 12:06:41 -08:00
Fangjin Yang
92e616de11
Merge pull request #1077 from metamx/remove-unused-imports
...
remove unused imports
2015-02-02 10:45:27 -08:00
nishantmonu51
ba932bb1f2
remove unused imports
2015-02-02 21:53:39 +05:30
fjy
d05032b98a
towards a community led druid
2015-01-31 20:57:36 -08:00
fjy
1f94de22c6
[maven-release-plugin] prepare for next development iteration
2015-01-20 14:23:55 -08:00