160 Commits

Author SHA1 Message Date
Jihoon Son
50a4ec2b0b Add support for headers and skipping thereof for CSV and TSV (#4254)
* initial commit

* small fixes

* fix bug

* fix bug

* address code review

* more cr

* more cr

* more cr

* fix

* Skip head rows for CSV and TSV

* Move checking skipHeadRows to FileIteratingFirehose

* Remove checking null iterators

* Remove unused imports

* Address comments

* Fix compilation error

* Address comments

* Add more tests

* Add a comment to ReplayableFirehose

* Addressing comments

* Add docs and fix typos
2017-05-15 22:57:31 -07:00
Himanshu
de081c711b RealtimeIndexTask to support alertTimeout in context (#4089)
* RealtimeIndexTask to support alertTimeout in context and raise alert if task process exists after the timeout

* move alertTimeout config to tuningConfig and document
2017-03-24 12:48:12 -07:00
Gian Merlino
b4289c0004 Remove "granularity" from IngestSegmentFirehose. (#4110)
It wasn't doing anything useful (the sequences were being concatted, and
cursor.getTime() wasn't being called) and it defaulted to Granularities.NONE.
Changing it to Granularities.ALL gave me a 700x+ performance boost on a
small dataset I was reindexing (2m27s to 365ms). Most of that was from avoiding
making a lot of unnecessary column selectors.
2017-03-24 10:28:54 -07:00
Gian Merlino
cab2e2f5d5 Add docs about filtering and indexes on numeric columns. (#4035) 2017-03-10 12:48:59 -08:00
kaijianding
19ac1c7c2c Add SameIntervalMergeTask for easier usage of MergeTask (#3981)
* Add SameIntervalMergeTask for easier usage of MergeTask

* fix a bug and add ut

* remove same_interval_merge_sub from Task.java and remove other no needed code
2017-03-06 11:21:32 -06:00
Jonathan Wei
a08660a9ca Support ingestion of long/float dimensions (#3966)
* Support ingestion for long/float dimensions

* Allow non-arrays for key components in indexing type strategy interfaces

* Add numeric index merge test, fixes

* Docs for numeric dims at ingestion

* Remove unused import

* Adjust docs, add aggregate on numeric dims tests

* remove unused imports

* Throw exception for bitmap method on numerics

* Move typed selector creation to DimensionIndexer interface

* unused imports

* Fix

* Remove unused DimensionSpec from indexer methods, check for dims first in inc index storage adapter

* Remove spaces
2017-02-28 19:04:41 -08:00
kaijianding
ef6a19c81b buildV9Directly in MergeTask and AppendTask (#3976)
* buildV9Directly in MergeTask and AppendTask

* add doc
2017-02-28 10:04:32 -08:00
praveev
c3bf40108d One granularity (#3850)
* Refactor Segment Granularity

* Beginning of one granularity

* Copy the fix for custom periods in segment-grunalrity over here.

* Remove the custom serialization for now.

* Compilation cleanup

* Reformat code

* Fixing unit tests

* Unify to use a single iterable

* Backward compatibility for rolling upgrade

* Minor check style. Cosmetic changes.

* Rename length and millis to duration

* CR feedback

* Minor changes.
2017-02-25 01:02:29 -06:00
Jihoon Son
ebd100cbb0 Set default query granularity for null value (#3965) 2017-02-22 17:38:43 -08:00
Himanshu
9dfcf0763a disable javascript execution by default (#3818) 2017-02-13 15:11:18 -08:00
Gian Merlino
151ff6d064 flattenSpec: Document that "expr" is ignored for type "root". (#3884) 2017-01-31 10:27:20 -08:00
David Lim
ff52581bd3 IndexTask improvements (#3611)
* index task improvements

* code review changes

* add null check
2017-01-18 14:24:37 -08:00
Gian Merlino
bcd20441be Make buildV9Directly the default. (#3688) 2016-11-14 09:29:32 -08:00
praveev
52a74cf84f Use timestamp in millis as Map key instead of DateTime object (#3674)
* Use Long timestamp as key instead of DateTime.

DateTime representation is screwed up when you store with an obj
and read with a different DateTime obj.

For example: The code below fails when you use DateTime as key
```
        DateTime odt = DateTime.now(DateTimeUtils.getZone(DateTimeZone.forID("America/Los_Angeles")));
        HashMap<DateTime, String> map = new HashMap<>();
        map.put(odt, "abc");
        DateTime dt = new DateTime(odt.getMillis());
        System.out.println(map.get(dt));
```

* Respect timezone when creating the file.

* Update docs with timezone caveat in granularity spec

* Remove unused imports
2016-11-11 10:20:20 -08:00
Akash Dwivedi
3a83e0513e Doc update(batch-ingestion) to include useExplicitVersion. (#3557) 2016-10-07 14:48:00 -07:00
praveev
43cdc675c7 Add support for timezone in segment granularity (#3528)
* Add support for timezone in segment granularity

* CR feedback. Handle null timezone during equals check.

* Include timezone in docs.
Add timezone for ArbitraryGranularitySpec.
2016-10-03 08:15:42 -07:00
Gian Merlino
27bd5cb13a Add forceExtendableShardSpecs option to Hadoop indexing, IndexTask. (#3473)
Fixes #3241.
2016-09-21 13:40:04 -06:00
Gian Merlino
e0e28866ee JavaScript docs: Fix links and typos, add to TOC. (#3457) 2016-09-13 15:26:44 -07:00
Gian Merlino
76a24054e3 JavaScript docs, including docs for globals. (#3454) 2016-09-13 13:46:55 -07:00
Slim
ba6ddf307e Adding hadoop kerberos authentification. (#3419)
* adding kerberos authentication

* make the 2 functions identical
2016-09-13 10:42:50 -07:00
Dave Li
c4e8440c22 Adds long compression methods (#3148)
* add read

* update deprecated guava calls

* add write and vsizeserde

* add benchmark

* separate encoding and compression

* add header and reformat

* update doc

* address PR comment

* fix buffer order

* generate benchmark files

* separate encoding strategy and format

* fix benchmark

* modify supplier write to channel

* add float NONE handling

* address PR comment

* address PR comment 2
2016-08-30 16:17:46 -07:00
kaijianding
50d52a24fc ability to not rollup at index time, make pre aggregation an option (#3020)
* ability to not rollup at index time, make pre aggregation an option

* rename getRowIndexForRollup to getPriorIndex

* fix doc misspelling

* test query using no-rollup indexes

* fix benchmark fail due to jmh bug
2016-08-02 11:13:05 -07:00
Gian Merlino
e5397ed316 Link up Hadoop class loading docs better. (#3302) 2016-07-29 10:19:54 -07:00
Navis Ryu
cd7337fc8a Calculate max split size based on numMapTask in DatasourceInputFormat (#2882)
* Calculate max split size based on numMapTask

* updated docs & fixed possible ArithmeticException
2016-07-20 16:53:51 -07:00
Gian Merlino
ea03906fcf Configurable compressRunOnSerialization for Roaring bitmaps. (#3228)
Defaults to true, which is a change in behavior (this used to be false and unconfigurable).
2016-07-08 10:24:19 +05:30
Jonathan Wei
c5dbf364e3 Fix JSON flatten docs, add link to path expression tester (#3105) 2016-06-07 14:39:57 -07:00
Nishant
0ac1b27d53 Allow manually setting of shutoffTime for EventReceiverFirehose (#2803)
* Allow dynamically setting of shutoffTime for EventReceiverFirehose

Allow dynamically setting shutoffTime for EventReceiverFirehose

review comments and tests

* shut down exec on close
2016-05-24 07:24:00 -07:00
Gian Merlino
fffa9c8265 Fix flattenSpec docs, "nested" should be "path". (#2924) 2016-05-05 08:59:41 -07:00
David Lim
890bdb543d doc fixes (#2897) 2016-04-28 15:34:58 -07:00
Fangjin Yang
abd951df1a Document how to use roaring bitmaps (#2824)
* Document how to use roaring bitmaps

This fixes #2408.
While not all indexSpec properties are explained, it does explain how roaring bitmaps can be turned on.

* fix

* fix

* fix

* fix
2016-04-12 19:28:02 -07:00
Sébastien Launay
37d2ab623e Merge pull request #2815 from slaunay/documentation/hadoop-classpath-issue-fix-with-configuration
Doc for mapreduce.job.user.classpath.first=true
2016-04-12 10:51:51 -07:00
Himanshu Gupta
004b00bb96 config to explicitly specify classpath for hadoop container during hadoop ingestion 2016-03-25 10:51:28 -05:00
Gian Merlino
2dfd3877c0 Fix a bunch of broken links in the docs. 2016-03-23 10:21:28 -07:00
fjy
943cbe6e76 refactor extensions into their own docs 2016-03-22 18:54:10 -07:00
binlijin
bce600f5d5 Single dimension hash-based partitioning 2016-03-22 13:15:33 +08:00
Gian Merlino
a2b1652787 Clarify parser docs.
- Clarify what parseSpecs are used for.
- Avro, Protobuf should use timeAndDims parseSpecs.
- Hadoop jobs should use hadoopyString string parsers.
2016-03-10 08:45:04 -08:00
fjy
e3e932a4d4 refactor extensions into core and contrib 2016-03-08 17:12:09 -08:00
Fangjin Yang
8e36e6fa43 Merge pull request #2610 from dclim/add-combineText-doc
add combineText property and cleanup batch ingestion doc
2016-03-08 12:54:16 -08:00
dclim
df29667a89 add combineText property and cleanup batch ingestion doc 2016-03-08 13:10:34 -07:00
Himanshu Gupta
0402636598 configurable handoffConditionTimeout in realtime tasks for segment handoff wait 2016-03-05 10:14:54 -06:00
Slim Bouguerra
623e89aa54 skip corrupt message 2016-03-04 08:30:40 -06:00
Björn Zettergren
2462c82c0e New defaults for maxRowsInMemory rowFlushBoundary
To bring consistency to docs and source this commit changes the default
values for maxRowsInMemory and rowFlushBoundary to 75000 after
discussion in PR https://github.com/druid-io/druid/pull/2457.

The previous default was 500000 and it's lower now on the grounds that
it's better for a default to be somewhat less efficient, and work,
than to reach for the stars and possibly result in
"OutOfMemoryError: java heap space" errors.
2016-03-01 13:50:28 +01:00
Charles Allen
1fe277ee29 Merge pull request #2367 from se7entyse7en/feature-rackspace-cloud-files-static-firehose
Adds support to use Rackspace's cloudfiles as static firehose
2016-02-25 17:31:06 -08:00
Gian Merlino
3534483433 Better handling of ParseExceptions.
Two changes:
- Allow IncrementalIndex to suppress ParseExceptions on "aggregate".
- Add "reportParseExceptions" option to realtime tuning configs. By default this is "false".

Behavior of the counters should now be:

- processed: Number of rows indexed, including rows where some fields could be parsed and some could not.
- thrownAway: Number of rows thrown away due to rejection policy.
- unparseable: Number of rows thrown away due to being completely unparseable (no fields salvageable at all).

If "reportParseExceptions" is true then "unparseable" will always be zero (because a parse error would
cause an exception to be thrown). In addition, "processed" will only include fully parseable rows
(because even partial parse failures will cause exceptions to be thrown).

Fixes #2510.
2016-02-23 10:11:43 -08:00
Himanshu Gupta
21b0b8a07d new coordinator endpoint to get list of used segment given a dataSource and list of intervals 2016-02-21 23:17:58 -06:00
Himanshu Gupta
09ffcae4ae give user the option to specify the segments for dataSource inputSpec 2016-02-21 23:15:31 -06:00
Fangjin Yang
083f019a48 Merge pull request #2465 from druid-io/more-doc-fix
more doc fixes
2016-02-17 11:00:38 -08:00
fjy
7da6594bfe more doc fixes 2016-02-17 09:43:47 -08:00
Gian Merlino
3a996216bd Multivalued dimensions can be compressed since 0.8.0. 2016-02-17 08:33:21 -08:00
Himanshu
f6eebf5884 Merge pull request #2422 from rasahner/docMinorFixes
some minor doc changes
2016-02-09 10:03:22 -06:00