Fokko Driesprong
67920c114e
Fixed info message ( #3481 )
2016-09-21 15:50:29 -07:00
Gian Merlino
27bd5cb13a
Add forceExtendableShardSpecs option to Hadoop indexing, IndexTask. ( #3473 )
...
Fixes #3241 .
2016-09-21 13:40:04 -06:00
Slim
ba6ddf307e
Adding hadoop kerberos authentification. ( #3419 )
...
* adding kerberos authentication
* make the 2 functions identical
2016-09-13 10:42:50 -07:00
Jonathan Wei
df766b2bbd
Add dimension handling interface for ingestion and segment creation ( #3217 )
...
* Add dimension handling interface for ingestion and segment creation
* update javadocs for DimensionHandler/DimensionIndexer
* Move IndexIO row validation into DimensionHandler
* Fix null column skipping in mergerV9
* Add deprecation note for 'numeric_dims' filename pattern in IndexIO v8->v9 conversion
* Fix java7 test failure
2016-09-12 12:54:02 -07:00
Himanshu
3b6c81e7c0
fix cleanup of hadoop ingestion intermediate path ( #3385 )
2016-09-08 01:36:56 +05:30
Dave Li
c4e8440c22
Adds long compression methods ( #3148 )
...
* add read
* update deprecated guava calls
* add write and vsizeserde
* add benchmark
* separate encoding and compression
* add header and reformat
* update doc
* address PR comment
* fix buffer order
* generate benchmark files
* separate encoding strategy and format
* fix benchmark
* modify supplier write to channel
* add float NONE handling
* address PR comment
* address PR comment 2
2016-08-30 16:17:46 -07:00
Hamlet Lee
e4f0eac8e6
Fix issue #2707 ( #2708 )
2016-08-16 12:19:44 -05:00
Gian Merlino
a2bcd97512
IncrementalIndex: Fix multi-value dimensions returned from iterators. ( #3344 )
...
They had arrays as values, which MapBasedRow doesn't understand and
toStrings rather than converting to lists.
2016-08-10 08:47:29 -07:00
Gian Merlino
9437a7a313
HLL: Avoid some allocations when possible. ( #3314 )
...
- HLLC.fold avoids duplicating the other buffer by saving and restoring its position.
- HLLC.makeCollector(buffer) no longer duplicates incoming BBs.
- Updated call sites where appropriate to duplicate BBs passed to HLLC.
2016-08-03 18:08:52 -07:00
kaijianding
50d52a24fc
ability to not rollup at index time, make pre aggregation an option ( #3020 )
...
* ability to not rollup at index time, make pre aggregation an option
* rename getRowIndexForRollup to getPriorIndex
* fix doc misspelling
* test query using no-rollup indexes
* fix benchmark fail due to jmh bug
2016-08-02 11:13:05 -07:00
kaijianding
3dc2974894
Add timestampSpec to metadata.drd and SegmentMetadataQuery ( #3227 )
...
* save TimestampSpec in metadata.drd
* add timestampSpec info in SegmentMetadataQuery
2016-07-25 15:45:30 -07:00
Navis Ryu
cd7337fc8a
Calculate max split size based on numMapTask in DatasourceInputFormat ( #2882 )
...
* Calculate max split size based on numMapTask
* updated docs & fixed possible ArithmeticException
2016-07-20 16:53:51 -07:00
Hyukjin Kwon
55e7a52475
Replace deprecated usage for StringInputRowParser and JSONParseSpec ( #3215 )
2016-07-14 09:19:17 -07:00
Gian Merlino
ea03906fcf
Configurable compressRunOnSerialization for Roaring bitmaps. ( #3228 )
...
Defaults to true, which is a change in behavior (this used to be false and unconfigurable).
2016-07-08 10:24:19 +05:30
Hyukjin Kwon
45f553fc28
Replace the deprecated usage of NoneShardSpec ( #3166 )
2016-06-25 10:27:25 -07:00
Nishant
2696b0c451
Retry for transient exceptions while doing cleanup for Hadoop Jobs ( #3177 )
...
* fix 1828
fixes https://github.com/druid-io/druid/issues/1828
* remove unused import
* Review comment
2016-06-23 13:38:47 -07:00
Nishant
6f330dc816
Better handling for parseExceptions for Batch Ingestion ( #3171 )
...
* Better handling for parseExceptions
* make parseException handling consistent with Realtime
* change combiner default val to true
* review comments
* review comments
2016-06-22 16:38:29 -07:00
Gian Merlino
ebf890fe79
Update master version to 0.9.2-SNAPSHOT. ( #3133 )
2016-06-13 13:10:38 -07:00
Nishant
778f97a80e
attempt to fix-2906 ( #2985 )
...
* attempt to fix-2984
* review comments
* Add test
2016-05-18 15:12:38 -05:00
Charles Allen
15ccf451f9
Move QueryGranularity static fields to QueryGranularities ( #2980 )
...
* Move QueryGranularity static fields to QueryGranularityUtil
* Fixes #2979
* Add test showing #2979
* change name to QueryGranularities
2016-05-17 16:23:48 -07:00
David Lim
b489f63698
Supervisor for KafkaIndexTask ( #2656 )
...
* supervisor for kafka indexing tasks
* cr changes
2016-05-04 23:13:13 -07:00
Navis Ryu
49ef4d96cb
Merge pull request #2802 from navis/optimize_multiplepath_concat
...
Optimize adding lots of paths to pathspec
2016-04-11 23:35:28 -05:00
jon-wei
0e481d6f93
Allow filters to use extraction functions
2016-04-05 13:24:56 -07:00
Gian Merlino
977e867ad8
Downgrade geoip2, exclude com.google.http-client.
...
Reverts "Update com.maxmind.geoip2 to 2.6.0" and exclude the google http client
from com.maxmind.geoip2. This should satisfy the original need from #2646 (wanting
to run Druid along with an upgraded com.google.http-client) while preventing
Jackson conflicts pointed out in #2717 .
Fixes #2717 .
This reverts commit 21b7572533
.
2016-03-25 14:43:22 -07:00
Gian Merlino
ff25325f3b
Improved docs for multi-value dimensions.
...
- Add central doc for multi-value dimensions, with some content from other docs.
- Link to multi-value dimension doc from topN and groupBy docs.
- Fixes a broken link from dimensionspecs.md, which was presciently already
linking to this nonexistent doc.
- Resolve inconsistent naming in docs & code (sometimes "multi-valued", sometimes
"multi-value") in favor of "multi-value".
2016-03-22 14:40:55 -07:00
Himanshu
00d7021291
Merge pull request #2607 from jon-wei/dim_schema
...
Support use of DimensionSchema class in DimensionsSpec
2016-03-22 11:53:46 -05:00
Himanshu
3220b109ad
Merge pull request #2570 from binlijin/single_dimension_partitioning
...
Single dimension hash-based partitioning
2016-03-22 11:51:06 -05:00
binlijin
bce600f5d5
Single dimension hash-based partitioning
2016-03-22 13:15:33 +08:00
jon-wei
a59c9ee1b1
Support use of DimensionSchema class in DimensionsSpec
2016-03-21 13:12:04 -07:00
Gian Merlino
738dcd8cd9
Update version to 0.9.1-SNAPSHOT.
...
Fixes #2462
2016-03-17 10:34:20 -07:00
Himanshu
ea3281ad78
Merge pull request #2645 from atomx/gs-scheme
...
Add gs:// hdfs support
2016-03-14 22:15:42 -05:00
Erik Dubbelboer
375620cfb3
Add gs:// hdfs support
...
Used to access google cloud storage
2016-03-12 08:57:57 +00:00
Gian Merlino
187569e702
DataSource metadata.
...
Geared towards supporting transactional inserts of new segments. This involves an
interface "DataSourceMetadata" that allows combining of partially specified metadata
(useful for partitioned ingestion).
DataSource metadata is stored in a new "dataSource" table.
2016-03-10 17:41:50 -08:00
Fangjin Yang
1e49092ce7
Merge pull request #2627 from himanshug/fix_datasource_inputformat_locations
...
fix regression - bug in DatasourceInputFormat best effort split location finder code
2016-03-10 13:46:04 -08:00
Himanshu Gupta
eab8a0b54d
in DatasourceInputFormat code for determining segment block locations avoid the split calulation by helper TextInputFormat
2016-03-10 14:28:53 -06:00
Nishant
ba1185963b
Fix a bunch of dependencies
...
* Eliminate exclusion groups from pull-deps
* Only consider dependency nodes in pull-deps if they are not in the following scopes
* provided
* test
* system
* Fix a bunch of `<scope>provided</scope>` missing tags
* Better exclusions for a couple of problematic libs
2016-03-10 10:18:08 -08:00
Bingkun Guo
c20d7682a9
log exceptions correctly in DatasourceInputFormat and IndexGeneratorJob
2016-03-09 13:41:31 -06:00
gaodayue
a6dc3703ca
use ISODataTimeFormat for both hdfs and viewfs schema to support Federationed HDFS
2016-03-08 13:55:05 +08:00
Björn Zettergren
2462c82c0e
New defaults for maxRowsInMemory rowFlushBoundary
...
To bring consistency to docs and source this commit changes the default
values for maxRowsInMemory and rowFlushBoundary to 75000 after
discussion in PR https://github.com/druid-io/druid/pull/2457 .
The previous default was 500000 and it's lower now on the grounds that
it's better for a default to be somewhat less efficient, and work,
than to reach for the stars and possibly result in
"OutOfMemoryError: java heap space" errors.
2016-03-01 13:50:28 +01:00
Gian Merlino
3534483433
Better handling of ParseExceptions.
...
Two changes:
- Allow IncrementalIndex to suppress ParseExceptions on "aggregate".
- Add "reportParseExceptions" option to realtime tuning configs. By default this is "false".
Behavior of the counters should now be:
- processed: Number of rows indexed, including rows where some fields could be parsed and some could not.
- thrownAway: Number of rows thrown away due to rejection policy.
- unparseable: Number of rows thrown away due to being completely unparseable (no fields salvageable at all).
If "reportParseExceptions" is true then "unparseable" will always be zero (because a parse error would
cause an exception to be thrown). In addition, "processed" will only include fully parseable rows
(because even partial parse failures will cause exceptions to be thrown).
Fixes #2510 .
2016-02-23 10:11:43 -08:00
Himanshu Gupta
09ffcae4ae
give user the option to specify the segments for dataSource inputSpec
2016-02-21 23:15:31 -06:00
Himanshu Gupta
2faae9d0d1
In JobHelper.makeSegmentOutputPath(..) use DataSegmentPusherUtils to construct the segment storage path
2016-02-09 21:42:32 -06:00
Himanshu Gupta
b3437825f0
add ignoreWhenNoSegments flag to optionally ignore the dataSource inputSpec when no segments were found
2016-01-26 17:23:55 -06:00
binlijin
cd1c71ceb4
rename persistBackgroundCount to numBackgroundPersistThreads
2016-01-22 14:29:41 +08:00
Charles Allen
2a69a58570
Merge pull request #2149 from binlijin/master
...
Do persist IncrementalIndex in another thread in IndexGeneratorReducer
2016-01-20 17:06:42 -08:00
Fangjin Yang
996c1173c6
Merge pull request #2223 from navis/besteffort-split-locations
...
Best effort to find locations for input splits
2016-01-20 16:53:43 -08:00
Fangjin Yang
695f107870
Merge pull request #2302 from metamx/lowerCaseGranPathTest
...
Make GranularityPathSpecTest check with lower-case enums
2016-01-20 09:18:06 -08:00
Charles Allen
3c5ca3a5f2
Make GranularityPathSpecTest check with lower-case enums
2016-01-20 08:35:13 -08:00
binlijin
8e43e2c446
Do persist IncrementalIndex in another thread in IndexGeneratorReducer
2016-01-20 09:20:09 +08:00
jon-wei
747343e621
Preserve dimension order across indexes during ingestion
2016-01-19 13:34:11 -08:00
Jonathan Wei
df2906a91c
Merge pull request #2290 from gianm/index-merger-v9-stuff
...
Respect buildV9Directly in PlumberSchools, so it works on standalone realtime.
2016-01-19 13:04:00 -08:00
Gian Merlino
1dcf22edb7
Respect buildV9Directly in PlumberSchools, so it works on standalone realtime nodes.
...
Also parameterize some tests to run with/without buildV9Directly:
- IndexGeneratorJobTest
- RealtimeIndexTaskTest
- RealtimePlumberSchoolTest
2016-01-19 12:15:06 -08:00
Himanshu Gupta
164b0aad7a
removing Map<String,Object> segmentMetadata from methods in Index[Maker/Merger] and using Metadata class
...
instead of a Map to store segment metadata
2016-01-18 22:03:46 -06:00
navis.ryu
f03f7fb625
Best effort to find locations for input splits
2016-01-18 08:31:05 +09:00
Kurt Young
82ff98c2bf
add config for build v9 directly and update docs
2016-01-16 11:26:34 +08:00
Kurt Young
1f2168fae5
add IndexMergerV9
...
add unit tests for IndexMergerV9 and fix some bugs
add more unit tests and fix bugs
handle null values and add more tests
minor changes & use LoggingProgressIndicator in IndexGeneratorReducer
make some static class public from IndexMerger
minor changes and add some comments
changes for comments
2016-01-16 11:25:28 +08:00
navis.ryu
976ebc45c0
Simplify information in IncrementalIndex
2016-01-12 10:18:11 +09:00
dclim
2308c8c07f
continue hadoop job for sparse intervals
2016-01-07 01:35:08 -07:00
fjy
faf421726b
remove IndexMaker
2015-12-28 14:19:02 -08:00
Fangjin Yang
14229ba0f2
Merge pull request #1922 from metamx/jsonIgnoresFinalFields
...
Change DefaultObjectMapper to NOT overwrite final fields unless explicitly asked to
2015-12-18 15:38:32 -08:00
binlijin
219367221b
optimize InputRowSerde
2015-12-09 09:51:56 +08:00
Fangjin Yang
d957a6602c
Merge pull request #2049 from himanshug/hadoop_indexing_unique_path
...
add a unique string to intermediate path for the hadoop indexing task
2015-12-07 11:46:16 -08:00
Himanshu Gupta
6cfaf59d7e
add a unique string to intermediate path for the hadoop indexing task
2015-12-06 22:20:38 -06:00
Himanshu Gupta
62ba9ade37
unifying license header in all java files
2015-12-05 22:16:23 -06:00
Himanshu Gupta
61aaa09012
support multiple intervals in dataSource input spec
2015-12-03 21:28:04 -06:00
Fangjin Yang
21c84b5ff7
Merge pull request #1896 from gianm/allocate-segment
...
SegmentAllocateAction (fixes #1515 )
2015-11-18 21:05:46 -08:00
Gian Merlino
e4e5f0375b
SegmentAllocateAction ( fixes #1515 )
...
This is a feature meant to allow realtime tasks to work without being told upfront
what shardSpec they should use (so we can potentially publish a variable number
of segments per interval).
The idea is that there is a "pendingSegments" table in the metadata store that
tracks allocated segments. Each one has a segment id (the same segment id we know
and love) and is also part of a sequence.
The sequences are an idea from @cheddar that offers a way of doing replication.
If there are N tasks reading exactly the same data with exactly the same logic
(think Kafka tasks reading a fixed range of offsets) then you can place them
in the same sequence, and they will generate the same sequence of segments.
2015-11-11 16:54:35 -08:00
Xavier Léauté
fa6142e217
cleanup and remove unused imports
2015-11-11 12:25:21 -08:00
Charles Allen
abae47850a
Add backwards compatability for PR #1922
2015-11-11 10:27:00 -08:00
Gian Merlino
dfbd0e2b60
Merge pull request #1925 from gianm/fix-index-generator
...
Fix reference to INDEX_MAKER in IndexGeneratorJob.
2015-11-06 09:56:30 -08:00
Gian Merlino
75122dc396
Fix reference to INDEX_MAKER in IndexGeneratorJob.
2015-11-06 09:19:58 -08:00
Himanshu Gupta
6bed633121
do not use LoggingProcessIndicator in IndexGeneratorJob because that uses Stopwatch methods from guava not available in older guava versions, this makes the behavior same as LegacyIndexGeneratorJob
2015-11-06 00:40:51 -06:00
Charles Allen
929b981710
Change DefaultObjectMapper to NOT overwrite final fields unless explicitly asked to
2015-11-05 18:10:13 -08:00
Xavier Léauté
223d1ebe9f
fix a very old todo
2015-11-05 13:00:30 -08:00
fjy
8f231fd3e3
cleanup druid codebase
2015-11-04 13:59:53 -08:00
Himanshu Gupta
84f7d8d264
making static final variables in HadoopDruidIndexerConfig upper case
2015-11-02 23:24:26 -06:00
Himanshu Gupta
8b67417ac8
make methods in Index[Merger,Maker,IO] non-static so that they can have
...
appropriate ObjectMapper injected instead of creating one statically
2015-11-02 23:24:26 -06:00
Himanshu Gupta
aeffeaf3e2
fixing hadoop test scope dependencies in indexing-hadoop
2015-10-26 17:09:39 -05:00
Nishant
3641a0e553
Fix Race in jar upload during hadoop indexing - https://github.com/druid-io/druid/issues/582
...
few fixes
delete intermediate file early
better exception handling
use static pattern instead of compiling it every time
Add retry for transient exceptions
remove usage of deprecated method.
Add test
fix imports
fix javadoc
review comment.
review comment: handle crazy snapshot naming
review comments
remove default retry count in favour of already present constant
review comment
make random intermediate and final paths.
review comment, use temporaryFolder where possible
2015-10-22 21:41:07 +05:30
Xavier Léauté
e4ac78e43d
bump next snapshot to 0.9.0
2015-10-20 13:46:13 -07:00
Xavier Léauté
4c2c7a2c37
update version to 0.8.3
2015-10-14 21:40:55 -07:00
Himanshu Gupta
0368260018
For dataSource inputSpec in hadoop batch ingestion, use configured query granularity for reading existing segments instead of NONE
2015-10-12 22:19:44 -05:00
Gian Merlino
3aba401ee0
SQLMetadataConnector: Retry table creation, in case something goes wrong.
...
Also rejigger table creation methods to not take a DBI. It's already available
inside the connector, and everyone was just using that one anyway.
2015-09-24 21:39:36 -07:00
Himanshu Gupta
e8b9ee85a7
HadoopyStringInputRowParser to convert stringy Text, BytesWritable etc into InputRow
2015-09-16 10:58:13 -05:00
Himanshu Gupta
74f4572bd4
Lazily deserialize "parser" to InputRowParser in DataSchema
...
so that user hadoop related InputRowParsers are created only when needed
this allows overlord to accept a HadoopIndexTask with a hadoopy InputRowParser
and not fail because hadoopy InputRowParser might need hadoop libraries
2015-09-16 10:58:13 -05:00
Himanshu Gupta
9ca6106128
user specified hadoop settings are ignored if explicitly set in code
2015-08-31 10:50:18 -05:00
Gian Merlino
940e1aa3eb
Replace funky imports with standard ones.
...
1) Lots of Guava imports were not coming from the actual Guava
2) junit.framework.Assert should be org.junit.Assert
2015-08-28 18:02:05 -07:00
jon-wei
e5c4927b14
Add support for parsing BytesWritable strings to Hadoop Indexer
2015-08-28 14:27:14 -07:00
Gian Merlino
414a6fb477
Fix overlapping segments in IngestSegmentFirehose, DatasourceInputFormat.
...
Fixes #1678 . IngestSegmentFirehose (and its users) need to remember which
windows of which segments should actually be read, based on a timeline.
2015-08-28 07:32:41 -07:00
Himanshu Gupta
2e0dd1d792
adding UTs and addressing review comments to
...
firehoseV2 addition to Realtime[Manager|Plumber],
essential segment metadata persist support,
kafka-simple-consumer-firehose extension patch
2015-08-27 20:50:46 -05:00
lvjq
2237a8cf0f
kafka 8 simple consumer firehose
2015-08-27 20:50:46 -05:00
Charles Allen
e38cf54bc8
Migrate TestDerbyConnector to a JUnit @Rule
2015-08-26 21:47:40 -07:00
Himanshu Gupta
b3c570e78d
update BatchDeltaIngestion.testDeltaIngestion(..) to check for proper glob path handling
2015-08-20 21:36:34 -05:00
Himanshu Gupta
85e3ce9096
split hadoop glob path before adding it to MultipleInputs
...
This can be safely reverted once https://issues.apache.org/jira/browse/MAPREDUCE-5061 is fixed
2015-08-20 21:36:34 -05:00
Himanshu Gupta
a603bd9547
HadoopGlobPathSplitter implementation to split hadoop glob paths
...
This can be safely reverted once https://issues.apache.org/jira/browse/MAPREDUCE-5061 is fixed
2015-08-20 21:36:34 -05:00
Himanshu Gupta
cf3ec8eb46
helpful cause explaining why SegmentDescriptorInfo did not exist
2015-08-19 10:29:04 -05:00
Xavier Léauté
3b2e41e42a
update for next release
2015-08-18 17:16:46 -07:00
Himanshu Gupta
a3bab5b7d9
IndexGeneratorJobTest type unit test for batch delta ingestion and reindexing
2015-08-16 14:07:35 -05:00
Himanshu Gupta
15fa43dd43
changing DatasourcePathSpec, to get segment list, so that hadoop indexer uses overlord action to get list of segments and passes when running as an overlord task. and, uses metadata store directly when running as standalone hadoop indexer
...
also, serialized list of segments is passed to DatasourcePathSpec so that hadoop classloader issues do not creep up
2015-08-16 14:07:35 -05:00
Himanshu Gupta
45947a1021
add ability to specify Multiple PathSpecs in batch ingestion, so that we can grab data from multiple places in same ingestion
...
Conflicts:
indexing-hadoop/src/main/java/io/druid/indexer/HadoopDruidIndexerConfig.java
indexing-hadoop/src/main/java/io/druid/indexer/JobHelper.java
Conflicts:
indexing-hadoop/src/main/java/io/druid/indexer/path/PathSpec.java
2015-08-16 13:15:38 -05:00