Himanshu Gupta
09ffcae4ae
give user the option to specify the segments for dataSource inputSpec
2016-02-21 23:15:31 -06:00
Himanshu Gupta
2faae9d0d1
In JobHelper.makeSegmentOutputPath(..) use DataSegmentPusherUtils to construct the segment storage path
2016-02-09 21:42:32 -06:00
Himanshu Gupta
b3437825f0
add ignoreWhenNoSegments flag to optionally ignore the dataSource inputSpec when no segments were found
2016-01-26 17:23:55 -06:00
binlijin
cd1c71ceb4
rename persistBackgroundCount to numBackgroundPersistThreads
2016-01-22 14:29:41 +08:00
Charles Allen
2a69a58570
Merge pull request #2149 from binlijin/master
...
Do persist IncrementalIndex in another thread in IndexGeneratorReducer
2016-01-20 17:06:42 -08:00
Fangjin Yang
996c1173c6
Merge pull request #2223 from navis/besteffort-split-locations
...
Best effort to find locations for input splits
2016-01-20 16:53:43 -08:00
Fangjin Yang
695f107870
Merge pull request #2302 from metamx/lowerCaseGranPathTest
...
Make GranularityPathSpecTest check with lower-case enums
2016-01-20 09:18:06 -08:00
Charles Allen
3c5ca3a5f2
Make GranularityPathSpecTest check with lower-case enums
2016-01-20 08:35:13 -08:00
binlijin
8e43e2c446
Do persist IncrementalIndex in another thread in IndexGeneratorReducer
2016-01-20 09:20:09 +08:00
jon-wei
747343e621
Preserve dimension order across indexes during ingestion
2016-01-19 13:34:11 -08:00
Jonathan Wei
df2906a91c
Merge pull request #2290 from gianm/index-merger-v9-stuff
...
Respect buildV9Directly in PlumberSchools, so it works on standalone realtime.
2016-01-19 13:04:00 -08:00
Gian Merlino
1dcf22edb7
Respect buildV9Directly in PlumberSchools, so it works on standalone realtime nodes.
...
Also parameterize some tests to run with/without buildV9Directly:
- IndexGeneratorJobTest
- RealtimeIndexTaskTest
- RealtimePlumberSchoolTest
2016-01-19 12:15:06 -08:00
Himanshu Gupta
164b0aad7a
removing Map<String,Object> segmentMetadata from methods in Index[Maker/Merger] and using Metadata class
...
instead of a Map to store segment metadata
2016-01-18 22:03:46 -06:00
navis.ryu
f03f7fb625
Best effort to find locations for input splits
2016-01-18 08:31:05 +09:00
Kurt Young
82ff98c2bf
add config for build v9 directly and update docs
2016-01-16 11:26:34 +08:00
Kurt Young
1f2168fae5
add IndexMergerV9
...
add unit tests for IndexMergerV9 and fix some bugs
add more unit tests and fix bugs
handle null values and add more tests
minor changes & use LoggingProgressIndicator in IndexGeneratorReducer
make some static class public from IndexMerger
minor changes and add some comments
changes for comments
2016-01-16 11:25:28 +08:00
navis.ryu
976ebc45c0
Simplify information in IncrementalIndex
2016-01-12 10:18:11 +09:00
dclim
2308c8c07f
continue hadoop job for sparse intervals
2016-01-07 01:35:08 -07:00
fjy
faf421726b
remove IndexMaker
2015-12-28 14:19:02 -08:00
Fangjin Yang
14229ba0f2
Merge pull request #1922 from metamx/jsonIgnoresFinalFields
...
Change DefaultObjectMapper to NOT overwrite final fields unless explicitly asked to
2015-12-18 15:38:32 -08:00
binlijin
219367221b
optimize InputRowSerde
2015-12-09 09:51:56 +08:00
Fangjin Yang
d957a6602c
Merge pull request #2049 from himanshug/hadoop_indexing_unique_path
...
add a unique string to intermediate path for the hadoop indexing task
2015-12-07 11:46:16 -08:00
Himanshu Gupta
6cfaf59d7e
add a unique string to intermediate path for the hadoop indexing task
2015-12-06 22:20:38 -06:00
Himanshu Gupta
62ba9ade37
unifying license header in all java files
2015-12-05 22:16:23 -06:00
Himanshu Gupta
61aaa09012
support multiple intervals in dataSource input spec
2015-12-03 21:28:04 -06:00
Fangjin Yang
21c84b5ff7
Merge pull request #1896 from gianm/allocate-segment
...
SegmentAllocateAction (fixes #1515 )
2015-11-18 21:05:46 -08:00
Gian Merlino
e4e5f0375b
SegmentAllocateAction ( fixes #1515 )
...
This is a feature meant to allow realtime tasks to work without being told upfront
what shardSpec they should use (so we can potentially publish a variable number
of segments per interval).
The idea is that there is a "pendingSegments" table in the metadata store that
tracks allocated segments. Each one has a segment id (the same segment id we know
and love) and is also part of a sequence.
The sequences are an idea from @cheddar that offers a way of doing replication.
If there are N tasks reading exactly the same data with exactly the same logic
(think Kafka tasks reading a fixed range of offsets) then you can place them
in the same sequence, and they will generate the same sequence of segments.
2015-11-11 16:54:35 -08:00
Xavier Léauté
fa6142e217
cleanup and remove unused imports
2015-11-11 12:25:21 -08:00
Charles Allen
abae47850a
Add backwards compatability for PR #1922
2015-11-11 10:27:00 -08:00
Gian Merlino
dfbd0e2b60
Merge pull request #1925 from gianm/fix-index-generator
...
Fix reference to INDEX_MAKER in IndexGeneratorJob.
2015-11-06 09:56:30 -08:00
Gian Merlino
75122dc396
Fix reference to INDEX_MAKER in IndexGeneratorJob.
2015-11-06 09:19:58 -08:00
Himanshu Gupta
6bed633121
do not use LoggingProcessIndicator in IndexGeneratorJob because that uses Stopwatch methods from guava not available in older guava versions, this makes the behavior same as LegacyIndexGeneratorJob
2015-11-06 00:40:51 -06:00
Charles Allen
929b981710
Change DefaultObjectMapper to NOT overwrite final fields unless explicitly asked to
2015-11-05 18:10:13 -08:00
Xavier Léauté
223d1ebe9f
fix a very old todo
2015-11-05 13:00:30 -08:00
fjy
8f231fd3e3
cleanup druid codebase
2015-11-04 13:59:53 -08:00
Himanshu Gupta
84f7d8d264
making static final variables in HadoopDruidIndexerConfig upper case
2015-11-02 23:24:26 -06:00
Himanshu Gupta
8b67417ac8
make methods in Index[Merger,Maker,IO] non-static so that they can have
...
appropriate ObjectMapper injected instead of creating one statically
2015-11-02 23:24:26 -06:00
Himanshu Gupta
aeffeaf3e2
fixing hadoop test scope dependencies in indexing-hadoop
2015-10-26 17:09:39 -05:00
Nishant
3641a0e553
Fix Race in jar upload during hadoop indexing - https://github.com/druid-io/druid/issues/582
...
few fixes
delete intermediate file early
better exception handling
use static pattern instead of compiling it every time
Add retry for transient exceptions
remove usage of deprecated method.
Add test
fix imports
fix javadoc
review comment.
review comment: handle crazy snapshot naming
review comments
remove default retry count in favour of already present constant
review comment
make random intermediate and final paths.
review comment, use temporaryFolder where possible
2015-10-22 21:41:07 +05:30
Xavier Léauté
e4ac78e43d
bump next snapshot to 0.9.0
2015-10-20 13:46:13 -07:00
Xavier Léauté
4c2c7a2c37
update version to 0.8.3
2015-10-14 21:40:55 -07:00
Himanshu Gupta
0368260018
For dataSource inputSpec in hadoop batch ingestion, use configured query granularity for reading existing segments instead of NONE
2015-10-12 22:19:44 -05:00
Gian Merlino
3aba401ee0
SQLMetadataConnector: Retry table creation, in case something goes wrong.
...
Also rejigger table creation methods to not take a DBI. It's already available
inside the connector, and everyone was just using that one anyway.
2015-09-24 21:39:36 -07:00
Himanshu Gupta
e8b9ee85a7
HadoopyStringInputRowParser to convert stringy Text, BytesWritable etc into InputRow
2015-09-16 10:58:13 -05:00
Himanshu Gupta
74f4572bd4
Lazily deserialize "parser" to InputRowParser in DataSchema
...
so that user hadoop related InputRowParsers are created only when needed
this allows overlord to accept a HadoopIndexTask with a hadoopy InputRowParser
and not fail because hadoopy InputRowParser might need hadoop libraries
2015-09-16 10:58:13 -05:00
Himanshu Gupta
9ca6106128
user specified hadoop settings are ignored if explicitly set in code
2015-08-31 10:50:18 -05:00
Gian Merlino
940e1aa3eb
Replace funky imports with standard ones.
...
1) Lots of Guava imports were not coming from the actual Guava
2) junit.framework.Assert should be org.junit.Assert
2015-08-28 18:02:05 -07:00
jon-wei
e5c4927b14
Add support for parsing BytesWritable strings to Hadoop Indexer
2015-08-28 14:27:14 -07:00
Gian Merlino
414a6fb477
Fix overlapping segments in IngestSegmentFirehose, DatasourceInputFormat.
...
Fixes #1678 . IngestSegmentFirehose (and its users) need to remember which
windows of which segments should actually be read, based on a timeline.
2015-08-28 07:32:41 -07:00
Himanshu Gupta
2e0dd1d792
adding UTs and addressing review comments to
...
firehoseV2 addition to Realtime[Manager|Plumber],
essential segment metadata persist support,
kafka-simple-consumer-firehose extension patch
2015-08-27 20:50:46 -05:00