1029 Commits

Author SHA1 Message Date
Gian Merlino
977e867ad8 Downgrade geoip2, exclude com.google.http-client.
Reverts "Update com.maxmind.geoip2 to 2.6.0" and exclude the google http client
from com.maxmind.geoip2. This should satisfy the original need from #2646 (wanting
to run Druid along with an upgraded com.google.http-client) while preventing
Jackson conflicts pointed out in #2717.

Fixes #2717.

This reverts commit 21b7572533592f1700f86379483d87e9e340f2a7.
2016-03-25 14:43:22 -07:00
Gian Merlino
ff25325f3b Improved docs for multi-value dimensions.
- Add central doc for multi-value dimensions, with some content from other docs.
- Link to multi-value dimension doc from topN and groupBy docs.
- Fixes a broken link from dimensionspecs.md, which was presciently already
  linking to this nonexistent doc.
- Resolve inconsistent naming in docs & code (sometimes "multi-valued", sometimes
  "multi-value") in favor of "multi-value".
2016-03-22 14:40:55 -07:00
Himanshu
00d7021291 Merge pull request #2607 from jon-wei/dim_schema
Support use of DimensionSchema class in DimensionsSpec
2016-03-22 11:53:46 -05:00
Himanshu
3220b109ad Merge pull request #2570 from binlijin/single_dimension_partitioning
Single dimension hash-based partitioning
2016-03-22 11:51:06 -05:00
binlijin
bce600f5d5 Single dimension hash-based partitioning 2016-03-22 13:15:33 +08:00
jon-wei
a59c9ee1b1 Support use of DimensionSchema class in DimensionsSpec 2016-03-21 13:12:04 -07:00
Gian Merlino
738dcd8cd9 Update version to 0.9.1-SNAPSHOT.
Fixes #2462
2016-03-17 10:34:20 -07:00
Himanshu
ea3281ad78 Merge pull request #2645 from atomx/gs-scheme
Add gs:// hdfs support
2016-03-14 22:15:42 -05:00
Erik Dubbelboer
375620cfb3 Add gs:// hdfs support
Used to access google cloud storage
2016-03-12 08:57:57 +00:00
Gian Merlino
187569e702 DataSource metadata.
Geared towards supporting transactional inserts of new segments. This involves an
interface "DataSourceMetadata" that allows combining of partially specified metadata
(useful for partitioned ingestion).

DataSource metadata is stored in a new "dataSource" table.
2016-03-10 17:41:50 -08:00
Fangjin Yang
1e49092ce7 Merge pull request #2627 from himanshug/fix_datasource_inputformat_locations
fix regression - bug in DatasourceInputFormat best effort split location finder code
2016-03-10 13:46:04 -08:00
Himanshu Gupta
eab8a0b54d in DatasourceInputFormat code for determining segment block locations avoid the split calulation by helper TextInputFormat 2016-03-10 14:28:53 -06:00
Nishant
ba1185963b Fix a bunch of dependencies
* Eliminate exclusion groups from pull-deps
* Only consider dependency nodes in pull-deps if they are not in the following scopes
	* provided
	* test
	* system
* Fix a bunch of `<scope>provided</scope>` missing tags
* Better exclusions for a couple of problematic libs
2016-03-10 10:18:08 -08:00
Bingkun Guo
c20d7682a9 log exceptions correctly in DatasourceInputFormat and IndexGeneratorJob 2016-03-09 13:41:31 -06:00
gaodayue
a6dc3703ca use ISODataTimeFormat for both hdfs and viewfs schema to support Federationed HDFS 2016-03-08 13:55:05 +08:00
Björn Zettergren
2462c82c0e New defaults for maxRowsInMemory rowFlushBoundary
To bring consistency to docs and source this commit changes the default
values for maxRowsInMemory and rowFlushBoundary to 75000 after
discussion in PR https://github.com/druid-io/druid/pull/2457.

The previous default was 500000 and it's lower now on the grounds that
it's better for a default to be somewhat less efficient, and work,
than to reach for the stars and possibly result in
"OutOfMemoryError: java heap space" errors.
2016-03-01 13:50:28 +01:00
Gian Merlino
3534483433 Better handling of ParseExceptions.
Two changes:
- Allow IncrementalIndex to suppress ParseExceptions on "aggregate".
- Add "reportParseExceptions" option to realtime tuning configs. By default this is "false".

Behavior of the counters should now be:

- processed: Number of rows indexed, including rows where some fields could be parsed and some could not.
- thrownAway: Number of rows thrown away due to rejection policy.
- unparseable: Number of rows thrown away due to being completely unparseable (no fields salvageable at all).

If "reportParseExceptions" is true then "unparseable" will always be zero (because a parse error would
cause an exception to be thrown). In addition, "processed" will only include fully parseable rows
(because even partial parse failures will cause exceptions to be thrown).

Fixes #2510.
2016-02-23 10:11:43 -08:00
Himanshu Gupta
09ffcae4ae give user the option to specify the segments for dataSource inputSpec 2016-02-21 23:15:31 -06:00
Himanshu Gupta
2faae9d0d1 In JobHelper.makeSegmentOutputPath(..) use DataSegmentPusherUtils to construct the segment storage path 2016-02-09 21:42:32 -06:00
Himanshu Gupta
b3437825f0 add ignoreWhenNoSegments flag to optionally ignore the dataSource inputSpec when no segments were found 2016-01-26 17:23:55 -06:00
binlijin
cd1c71ceb4 rename persistBackgroundCount to numBackgroundPersistThreads 2016-01-22 14:29:41 +08:00
Charles Allen
2a69a58570 Merge pull request #2149 from binlijin/master
Do persist IncrementalIndex in another thread in IndexGeneratorReducer
2016-01-20 17:06:42 -08:00
Fangjin Yang
996c1173c6 Merge pull request #2223 from navis/besteffort-split-locations
Best effort to find locations for input splits
2016-01-20 16:53:43 -08:00
Fangjin Yang
695f107870 Merge pull request #2302 from metamx/lowerCaseGranPathTest
Make GranularityPathSpecTest check with lower-case enums
2016-01-20 09:18:06 -08:00
Charles Allen
3c5ca3a5f2 Make GranularityPathSpecTest check with lower-case enums 2016-01-20 08:35:13 -08:00
binlijin
8e43e2c446 Do persist IncrementalIndex in another thread in IndexGeneratorReducer 2016-01-20 09:20:09 +08:00
jon-wei
747343e621 Preserve dimension order across indexes during ingestion 2016-01-19 13:34:11 -08:00
Jonathan Wei
df2906a91c Merge pull request #2290 from gianm/index-merger-v9-stuff
Respect buildV9Directly in PlumberSchools, so it works on standalone realtime.
2016-01-19 13:04:00 -08:00
Gian Merlino
1dcf22edb7 Respect buildV9Directly in PlumberSchools, so it works on standalone realtime nodes.
Also parameterize some tests to run with/without buildV9Directly:

- IndexGeneratorJobTest
- RealtimeIndexTaskTest
- RealtimePlumberSchoolTest
2016-01-19 12:15:06 -08:00
Himanshu Gupta
164b0aad7a removing Map<String,Object> segmentMetadata from methods in Index[Maker/Merger] and using Metadata class
instead of a Map to store segment metadata
2016-01-18 22:03:46 -06:00
navis.ryu
f03f7fb625 Best effort to find locations for input splits 2016-01-18 08:31:05 +09:00
Kurt Young
82ff98c2bf add config for build v9 directly and update docs 2016-01-16 11:26:34 +08:00
Kurt Young
1f2168fae5 add IndexMergerV9
add unit tests for IndexMergerV9 and fix some bugs

add more unit tests and fix bugs

handle null values and add more tests

minor changes & use LoggingProgressIndicator in IndexGeneratorReducer

make some static class public from IndexMerger

minor changes and add some comments

changes for comments
2016-01-16 11:25:28 +08:00
navis.ryu
976ebc45c0 Simplify information in IncrementalIndex 2016-01-12 10:18:11 +09:00
dclim
2308c8c07f continue hadoop job for sparse intervals 2016-01-07 01:35:08 -07:00
fjy
faf421726b remove IndexMaker 2015-12-28 14:19:02 -08:00
Fangjin Yang
14229ba0f2 Merge pull request #1922 from metamx/jsonIgnoresFinalFields
Change DefaultObjectMapper to NOT overwrite final fields unless explicitly asked to
2015-12-18 15:38:32 -08:00
binlijin
219367221b optimize InputRowSerde 2015-12-09 09:51:56 +08:00
Fangjin Yang
d957a6602c Merge pull request #2049 from himanshug/hadoop_indexing_unique_path
add a unique string to intermediate path for the hadoop indexing task
2015-12-07 11:46:16 -08:00
Himanshu Gupta
6cfaf59d7e add a unique string to intermediate path for the hadoop indexing task 2015-12-06 22:20:38 -06:00
Himanshu Gupta
62ba9ade37 unifying license header in all java files 2015-12-05 22:16:23 -06:00
Himanshu Gupta
61aaa09012 support multiple intervals in dataSource input spec 2015-12-03 21:28:04 -06:00
Fangjin Yang
21c84b5ff7 Merge pull request #1896 from gianm/allocate-segment
SegmentAllocateAction (fixes #1515)
2015-11-18 21:05:46 -08:00
Gian Merlino
e4e5f0375b SegmentAllocateAction (fixes #1515)
This is a feature meant to allow realtime tasks to work without being told upfront
what shardSpec they should use (so we can potentially publish a variable number
of segments per interval).

The idea is that there is a "pendingSegments" table in the metadata store that
tracks allocated segments. Each one has a segment id (the same segment id we know
and love) and is also part of a sequence.

The sequences are an idea from @cheddar that offers a way of doing replication.
If there are N tasks reading exactly the same data with exactly the same logic
(think Kafka tasks reading a fixed range of offsets) then you can place them
in the same sequence, and they will generate the same sequence of segments.
2015-11-11 16:54:35 -08:00
Xavier Léauté
fa6142e217 cleanup and remove unused imports 2015-11-11 12:25:21 -08:00
Charles Allen
abae47850a Add backwards compatability for PR #1922 2015-11-11 10:27:00 -08:00
Gian Merlino
dfbd0e2b60 Merge pull request #1925 from gianm/fix-index-generator
Fix reference to INDEX_MAKER in IndexGeneratorJob.
2015-11-06 09:56:30 -08:00
Gian Merlino
75122dc396 Fix reference to INDEX_MAKER in IndexGeneratorJob. 2015-11-06 09:19:58 -08:00
Himanshu Gupta
6bed633121 do not use LoggingProcessIndicator in IndexGeneratorJob because that uses Stopwatch methods from guava not available in older guava versions, this makes the behavior same as LegacyIndexGeneratorJob 2015-11-06 00:40:51 -06:00
Charles Allen
929b981710 Change DefaultObjectMapper to NOT overwrite final fields unless explicitly asked to 2015-11-05 18:10:13 -08:00