Commit Graph

149 Commits

Author SHA1 Message Date
David Lim ff52581bd3 IndexTask improvements (#3611)
* index task improvements

* code review changes

* add null check
2017-01-18 14:24:37 -08:00
Gian Merlino bcd20441be Make buildV9Directly the default. (#3688) 2016-11-14 09:29:32 -08:00
praveev 52a74cf84f Use timestamp in millis as Map key instead of DateTime object (#3674)
* Use Long timestamp as key instead of DateTime.

DateTime representation is screwed up when you store with an obj
and read with a different DateTime obj.

For example: The code below fails when you use DateTime as key
```
        DateTime odt = DateTime.now(DateTimeUtils.getZone(DateTimeZone.forID("America/Los_Angeles")));
        HashMap<DateTime, String> map = new HashMap<>();
        map.put(odt, "abc");
        DateTime dt = new DateTime(odt.getMillis());
        System.out.println(map.get(dt));
```

* Respect timezone when creating the file.

* Update docs with timezone caveat in granularity spec

* Remove unused imports
2016-11-11 10:20:20 -08:00
Akash Dwivedi 3a83e0513e Doc update(batch-ingestion) to include useExplicitVersion. (#3557) 2016-10-07 14:48:00 -07:00
praveev 43cdc675c7 Add support for timezone in segment granularity (#3528)
* Add support for timezone in segment granularity

* CR feedback. Handle null timezone during equals check.

* Include timezone in docs.
Add timezone for ArbitraryGranularitySpec.
2016-10-03 08:15:42 -07:00
Gian Merlino 27bd5cb13a Add forceExtendableShardSpecs option to Hadoop indexing, IndexTask. (#3473)
Fixes #3241.
2016-09-21 13:40:04 -06:00
Gian Merlino e0e28866ee JavaScript docs: Fix links and typos, add to TOC. (#3457) 2016-09-13 15:26:44 -07:00
Gian Merlino 76a24054e3 JavaScript docs, including docs for globals. (#3454) 2016-09-13 13:46:55 -07:00
Slim ba6ddf307e Adding hadoop kerberos authentification. (#3419)
* adding kerberos authentication

* make the 2 functions identical
2016-09-13 10:42:50 -07:00
Dave Li c4e8440c22 Adds long compression methods (#3148)
* add read

* update deprecated guava calls

* add write and vsizeserde

* add benchmark

* separate encoding and compression

* add header and reformat

* update doc

* address PR comment

* fix buffer order

* generate benchmark files

* separate encoding strategy and format

* fix benchmark

* modify supplier write to channel

* add float NONE handling

* address PR comment

* address PR comment 2
2016-08-30 16:17:46 -07:00
kaijianding 50d52a24fc ability to not rollup at index time, make pre aggregation an option (#3020)
* ability to not rollup at index time, make pre aggregation an option

* rename getRowIndexForRollup to getPriorIndex

* fix doc misspelling

* test query using no-rollup indexes

* fix benchmark fail due to jmh bug
2016-08-02 11:13:05 -07:00
Gian Merlino e5397ed316 Link up Hadoop class loading docs better. (#3302) 2016-07-29 10:19:54 -07:00
Navis Ryu cd7337fc8a Calculate max split size based on numMapTask in DatasourceInputFormat (#2882)
* Calculate max split size based on numMapTask

* updated docs & fixed possible ArithmeticException
2016-07-20 16:53:51 -07:00
Gian Merlino ea03906fcf Configurable compressRunOnSerialization for Roaring bitmaps. (#3228)
Defaults to true, which is a change in behavior (this used to be false and unconfigurable).
2016-07-08 10:24:19 +05:30
Jonathan Wei c5dbf364e3 Fix JSON flatten docs, add link to path expression tester (#3105) 2016-06-07 14:39:57 -07:00
Nishant 0ac1b27d53 Allow manually setting of shutoffTime for EventReceiverFirehose (#2803)
* Allow dynamically setting of shutoffTime for EventReceiverFirehose

Allow dynamically setting shutoffTime for EventReceiverFirehose

review comments and tests

* shut down exec on close
2016-05-24 07:24:00 -07:00
Gian Merlino fffa9c8265 Fix flattenSpec docs, "nested" should be "path". (#2924) 2016-05-05 08:59:41 -07:00
David Lim 890bdb543d doc fixes (#2897) 2016-04-28 15:34:58 -07:00
Fangjin Yang abd951df1a Document how to use roaring bitmaps (#2824)
* Document how to use roaring bitmaps

This fixes #2408.
While not all indexSpec properties are explained, it does explain how roaring bitmaps can be turned on.

* fix

* fix

* fix

* fix
2016-04-12 19:28:02 -07:00
Sébastien Launay 37d2ab623e Merge pull request #2815 from slaunay/documentation/hadoop-classpath-issue-fix-with-configuration
Doc for mapreduce.job.user.classpath.first=true
2016-04-12 10:51:51 -07:00
Himanshu Gupta 004b00bb96 config to explicitly specify classpath for hadoop container during hadoop ingestion 2016-03-25 10:51:28 -05:00
Gian Merlino 2dfd3877c0 Fix a bunch of broken links in the docs. 2016-03-23 10:21:28 -07:00
fjy 943cbe6e76 refactor extensions into their own docs 2016-03-22 18:54:10 -07:00
binlijin bce600f5d5 Single dimension hash-based partitioning 2016-03-22 13:15:33 +08:00
Gian Merlino a2b1652787 Clarify parser docs.
- Clarify what parseSpecs are used for.
- Avro, Protobuf should use timeAndDims parseSpecs.
- Hadoop jobs should use hadoopyString string parsers.
2016-03-10 08:45:04 -08:00
fjy e3e932a4d4 refactor extensions into core and contrib 2016-03-08 17:12:09 -08:00
Fangjin Yang 8e36e6fa43 Merge pull request #2610 from dclim/add-combineText-doc
add combineText property and cleanup batch ingestion doc
2016-03-08 12:54:16 -08:00
dclim df29667a89 add combineText property and cleanup batch ingestion doc 2016-03-08 13:10:34 -07:00
Himanshu Gupta 0402636598 configurable handoffConditionTimeout in realtime tasks for segment handoff wait 2016-03-05 10:14:54 -06:00
Slim Bouguerra 623e89aa54 skip corrupt message 2016-03-04 08:30:40 -06:00
Björn Zettergren 2462c82c0e New defaults for maxRowsInMemory rowFlushBoundary
To bring consistency to docs and source this commit changes the default
values for maxRowsInMemory and rowFlushBoundary to 75000 after
discussion in PR https://github.com/druid-io/druid/pull/2457.

The previous default was 500000 and it's lower now on the grounds that
it's better for a default to be somewhat less efficient, and work,
than to reach for the stars and possibly result in
"OutOfMemoryError: java heap space" errors.
2016-03-01 13:50:28 +01:00
Charles Allen 1fe277ee29 Merge pull request #2367 from se7entyse7en/feature-rackspace-cloud-files-static-firehose
Adds support to use Rackspace's cloudfiles as static firehose
2016-02-25 17:31:06 -08:00
Gian Merlino 3534483433 Better handling of ParseExceptions.
Two changes:
- Allow IncrementalIndex to suppress ParseExceptions on "aggregate".
- Add "reportParseExceptions" option to realtime tuning configs. By default this is "false".

Behavior of the counters should now be:

- processed: Number of rows indexed, including rows where some fields could be parsed and some could not.
- thrownAway: Number of rows thrown away due to rejection policy.
- unparseable: Number of rows thrown away due to being completely unparseable (no fields salvageable at all).

If "reportParseExceptions" is true then "unparseable" will always be zero (because a parse error would
cause an exception to be thrown). In addition, "processed" will only include fully parseable rows
(because even partial parse failures will cause exceptions to be thrown).

Fixes #2510.
2016-02-23 10:11:43 -08:00
Himanshu Gupta 21b0b8a07d new coordinator endpoint to get list of used segment given a dataSource and list of intervals 2016-02-21 23:17:58 -06:00
Himanshu Gupta 09ffcae4ae give user the option to specify the segments for dataSource inputSpec 2016-02-21 23:15:31 -06:00
Fangjin Yang 083f019a48 Merge pull request #2465 from druid-io/more-doc-fix
more doc fixes
2016-02-17 11:00:38 -08:00
fjy 7da6594bfe more doc fixes 2016-02-17 09:43:47 -08:00
Gian Merlino 3a996216bd Multivalued dimensions can be compressed since 0.8.0. 2016-02-17 08:33:21 -08:00
Himanshu f6eebf5884 Merge pull request #2422 from rasahner/docMinorFixes
some minor doc changes
2016-02-09 10:03:22 -06:00
Robin 1d57e3267d some minor doc changes 2016-02-09 08:20:53 -06:00
fjy 6fc5bcb1ef fix docs 2016-02-08 13:40:53 -08:00
fjy 003f54e268 add doc rendering 2016-02-04 14:21:59 -08:00
fjy 1aa363cea7 new quickstart 2016-02-04 09:37:38 -08:00
Lou Marvin Caraig 9de57eb1c8 Added documentation 2016-02-02 14:32:12 +01:00
Björn Zettergren d373573c25 DOCs: Missing 'type' for leaveIntermediate
Added missing 'Boolean' as type for leaveIntermediate row in table TuningConfig
2016-01-29 14:42:19 +01:00
Himanshu Gupta b3437825f0 add ignoreWhenNoSegments flag to optionally ignore the dataSource inputSpec when no segments were found 2016-01-26 17:23:55 -06:00
binlijin cd1c71ceb4 rename persistBackgroundCount to numBackgroundPersistThreads 2016-01-22 14:29:41 +08:00
Nishant dcb7830330 Merge pull request #984 from drcrallen/thread-priority-rebase
Use thread priorities. (aka set `nice` values for background-like tasks)
2016-01-21 15:02:34 +05:30
Charles Allen 2a69a58570 Merge pull request #2149 from binlijin/master
Do persist IncrementalIndex in another thread in IndexGeneratorReducer
2016-01-20 17:06:42 -08:00
Charles Allen 2e1d6aaf3d Use thread priorities. (aka set `nice` values for background-like tasks)
* Defaults the thread priority to java.util.Thread.NORM_PRIORITY in io.druid.indexing.common.task.AbstractTask
 * Each exec service has its own Task Factory which is assigned a priority for spawned task. Therefore each priority class has a unique exec service
 * Added priority to tasks as taskPriority in the task context. <0 means low, 0 means take default, >0 means high. It is up to any particular implementation to determine how to handle these numbers
 * Add options to ForkingTaskRunner
    * Add "-XX:+UseThreadPriorities" default option
    * Add "-XX:ThreadPriorityPolicy=42" default option
 * AbstractTask - Removed unneded @JsonIgnore on priority
 * Added priority to RealtimePlumber executors. All sub-executors (non query runners) get Thread.MIN_PRIORITY
 * Add persistThreadPriority and mergeThreadPriority to realtime tuning config
2016-01-20 14:00:31 -08:00
Logan Linn c3bdaefe1f Update batch-ingestion.md
Fix documented type of the `dataGranularity` config
2016-01-19 17:20:47 -08:00
binlijin 8e43e2c446 Do persist IncrementalIndex in another thread in IndexGeneratorReducer 2016-01-20 09:20:09 +08:00
Kurt Young 82ff98c2bf add config for build v9 directly and update docs 2016-01-16 11:26:34 +08:00
Zhao Weinan 5e57ddb8cc Adding avro support to realtime & hadoop batch indexing. 2016-01-05 10:21:27 +08:00
Robin 0961c0b703 trivial documentation fix 2016-01-04 12:39:10 -06:00
fjy 88f6b9b5ad Multiple improvements for docs 2016-01-02 21:54:54 -08:00
Himanshu Gupta 48de9dfafa doc update to make it easy to find how to do re-indexing or delta ingestion 2015-12-30 23:58:09 -06:00
fjy 398a3ec620 add docs for more specs 2015-12-17 18:06:30 -08:00
jon-wei c53bf85d83 Add docs and benchmark for JSON flattening parser 2015-12-09 16:13:30 -08:00
Himanshu Gupta efe3c9f4a5 update the examples for batch reindexing/delta ingestion to use "intervals" instead of deprecated "interval" 2015-12-06 00:22:20 -06:00
Himanshu Gupta 61aaa09012 support multiple intervals in dataSource input spec 2015-12-03 21:28:04 -06:00
jon-wei 95dca4440f Update data formats doc with info about JSON multi-value dimensions 2015-11-24 14:38:06 -08:00
sahner a4ed2ce2d1 fix formatting in schema-design 2015-11-17 16:50:53 -06:00
fjy 8f231fd3e3 cleanup druid codebase 2015-11-04 13:59:53 -08:00
Nishant efc49da073 fix doc - correct default value for maxRowsInMemory 2015-11-01 22:09:24 -08:00
Bingkun Guo 4914925d65 New extension loading mechanism
1) Remove maven client from downloading extensions at runtime.
2) Provide a way to load Druid extensions and hadoop dependencies through file system.
3) Refactor pull-deps so that it can download extensions into extension directories.
4) Add documents on how to use this new extension loading mechanism.
5) Change the way how Druid tarball is generated. Now all the extensions + hadoop-client 2.3.0
are packaged within the Druid tarball.
2015-10-21 14:22:36 -05:00
Gian Merlino 933cbdf780 Adjust realtime constraints in the docs. 2015-10-09 10:52:52 -07:00
Gian Merlino b29cbf97a6 Docs: Suggest hadoopyString parser for Hadoop. 2015-09-16 10:19:42 -07:00
Himanshu Gupta 075b6d4385 update ingestion faq to mention dataSource inputSpec as an option of reindexing via hadoop 2015-09-10 14:41:13 -05:00
Xavier Léauté d89b0fa76a Merge pull request #1662 from qix/pathFormat-doc
Add documentation for pathFormat in batch ingestion
2015-08-31 11:14:54 -07:00
Josh Yudaken 29c29b42d3 Add default value and link to joda docs 2015-08-31 11:09:54 -07:00
lvjq 2237a8cf0f kafka 8 simple consumer firehose 2015-08-27 20:50:46 -05:00
Bingkun ae1f104c10 Fix batch ingestion doc 2015-08-26 15:16:21 -05:00
Gian Merlino 10946610f4 Merge pull request #1656 from druid-io/all-the-docs
more docs for common questions
2015-08-25 17:49:47 -07:00
fjy 4055f9ca48 more docs for common questions 2015-08-25 17:49:04 -07:00
sahner 3def847e28 add documentation about TimedShutoff firehose 2015-08-24 20:41:42 -05:00
Josh Yudaken 5e42aee49e Add documentation for pathFormat in batch ingestion 2015-08-24 14:39:57 -07:00
Himanshu Gupta cfd81bfac7 updating the docs on how to do hadoop batch re-ingesion and delta ingestion 2015-08-16 14:07:35 -05:00
fjy 012fff6616 fix firehose docs 2015-08-04 09:52:23 -07:00
Himanshu Gupta 7ee509bcd0 fix mysql references in tutorial docs 2015-07-30 22:05:05 -05:00
pdeva ef0439229d Specify dynamic dimension schema
Document how druid can dynamically infer dimension columns
2015-07-27 20:20:53 -07:00
sahner 4801de62a2 make "announce" the chathandler default in realtime node,
remove doc references to chathandler type "announce" since it is the default now,
2015-07-27 12:14:28 -05:00
pdeva 76bf8ccd8c correct key name 2015-07-25 21:58:37 -07:00
fjy 92293ef094 Added section on best practices for schema designa and a few other edits 2015-07-24 14:06:20 -07:00
Himanshu Gupta 119ec13d23 updating hadoop tuningConfig doc with useCombiner flag 2015-07-22 13:55:00 -05:00
Himanshu Gupta dd95ef77c0 recommend druid-hdfs-storage and hadoop dependencies to be in the classpath instead of added as an extension 2015-07-18 16:18:12 -05:00
Charles Allen e051e93d19 Merge pull request #1518 from RealROI/more-azure-features
Azure Blob Store support for Firehose and Indexing Service Logs
2015-07-17 16:10:22 -07:00
Zak Kristjanson 0bda7af52c Add more support for Azure Blob Store
Azure Blob Store support for Task Logs and a firehose for data ingestion
2015-07-17 15:38:21 -07:00
Shiyu Qiu bec8e8e23a fix doc data-formats.md 2015-07-15 17:13:33 -05:00
Tim 3b692fb6f7 fix #1525 - typo: "HadoopBatchIndexer" 2015-07-14 20:48:24 -07:00
fjy 08d00cc80f rework the realtime examples a bit; add more faq 2015-07-07 14:07:14 -07:00
sahner acd20e8c00 say explicitly that local firehose searches directories recursively for files 2015-07-05 14:46:44 -05:00
Fangjin Yang 2544f3655e Merge pull request #1457 from ravishrathod/rabbitmq-doc
updating doc for rabbitmq firehose
2015-06-23 08:24:49 -07:00
ravishrathod 9213fd3801 updating doc for rabbitmq firehose 2015-06-22 02:40:11 -04:00
fjy 9c74993559 fix protobuf impl and docs 2015-06-20 21:59:38 -07:00
fjy 74d8840414 Change tranquility links 2015-05-31 10:59:38 -07:00
Himanshu Gupta be4ecc4b91 in batch ingestion metadataUpdateSpec->type is derby, mysql etc and not metadata 2015-05-29 22:16:18 -05:00
Xavier Léauté d2346b6834 shorten links and file names
* remove redundant parts in file names
* delete unsupported "Druid-Personal-Demo-Cluster"
2015-05-29 20:55:42 -05:00
Himanshu Gupta 8edc2aaca3 renaming all *.md filenames to only have lowercase and dashes
so that they are editable on case-insensitive os as well
2015-05-29 20:55:42 -05:00