Commit Graph

1217 Commits

Author SHA1 Message Date
Gian Merlino 4cc39b2ee7 Alternative groupBy strategy. (#2998)
This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.

Both of these are described in more detail in #2987.

There are two goals of this patch:

1. Make it possible for historical/realtime nodes to return larger groupBy
   result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
   columns, avoiding materialization.

This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
2016-06-24 18:06:09 -07:00
michaelschiff 66d8ad36d7 adds new coordinator metrics 'segment/unavailable/count' and (#3176)
'segment/underReplicated/count' (#3173)
2016-06-23 14:53:15 -07:00
Gian Merlino da660bb592 DumpSegment tool. (#3182)
Fixes #2723.
2016-06-23 14:37:50 -07:00
Dave Li 12be1c0a4b Add bucket extraction function (#3033)
* add bucket extraction function

* add doc and header

* updated doc and test
2016-06-17 09:24:27 -07:00
linbo.jin 8c76fe7b97 docs: change OR to AND inside query docs about multi-value dims (#3162)
* docs: replace OR by AND inside topnquery docs about multi value dimensions

* docs: replace OR by AND inside groupby docs about multi value dimensions
2016-06-17 08:54:18 -07:00
Fangjin Yang 07288c8fc0 update compares some more (#3158)
* update compares some more

* fix

* fix

* fix
2016-06-16 18:34:43 -07:00
Gian Merlino c12712e8b8 Move "libraries.md" out of docs, onto the main site. (#3159) 2016-06-16 18:14:35 -07:00
Fangjin Yang 6c2fd75e46 update vs spark doc (#3116)
* update vs spark doc

* update based on comments
2016-06-15 10:30:19 -07:00
Gian Merlino 7da4a283a9 Add missing layout: toc to TOC. (#3144) 2016-06-14 10:48:05 -07:00
Gian Merlino dc2bf9efa5 Update absolute TOC links. (#3138)
See druid-io/druid-io.github.io#286.
2016-06-13 17:57:52 -07:00
Gian Merlino 3b3e772748 Add --no-default-remote-repositories flag to pull-deps. (#3120) 2016-06-13 17:01:18 +05:30
michaelschiff 7294ea87c3 link to statsd metrics emitter docs from development/extensions.html doc page (#3125) 2016-06-10 16:27:16 -07:00
Gian Merlino 5321ba3e8f Switch to absolute TOC (#3110)
Depends on druid-io/druid-io.github.io#282
2016-06-07 21:39:34 -07:00
Jonathan Wei c5dbf364e3 Fix JSON flatten docs, add link to path expression tester (#3105) 2016-06-07 14:39:57 -07:00
Kirill Kozlov 4ab675e863 Fix command name in example (#3088) 2016-06-07 10:44:27 -07:00
Kirill Kozlov 9f93be448e Fix logical operator in example (#3093) 2016-06-07 10:44:18 -07:00
Gian Merlino 99ee3f4dc3 Fixups, clarifications to lookup docs. (#3060) 2016-06-07 10:43:35 -07:00
Charles Allen fa41a6466a Cleanup the base lookup cluster wide config docs (#3061)
* Cleanup the base lookup cluster wide config docs

* Add better examples in lookups-cached-global.md

* Use actual valid stock lookups

* Fixed maps with :

* Add mix of lookups

* Better examples in extension

* Remove unneeded namespace requirement

* Add extra line space

* Add link to lookup tiers

* Renamed header
2016-06-07 10:42:41 -07:00
Charles Allen 8cac710546 Async lookups-cached-global by default (#3074)
* Async lookups-cached-global by default
* Also better lookup docs

* Fix test timeouts

* Fix timing of deserialized test

* Fix problem with 0 wait failing immediately
2016-06-03 15:58:10 -05:00
David Lim a2290a8f05 support seamless config changes (#3051) 2016-06-03 13:50:19 -07:00
Gian Merlino 2db5f49f35 Fix JavaScriptConfig. (#3062) 2016-06-02 23:59:00 -07:00
Gian Merlino 603fbbcc20 Fix docs for "contains" search spec. (#3066) 2016-06-02 19:03:40 -07:00
Vadim Ogievetsky 13c267bfee Added new line for site formatting (#3059) 2016-06-02 11:36:45 -07:00
Parag Jain 44237e25d9 fix duration format and number format (#3057) 2016-06-02 10:09:21 -07:00
Erik Dubbelboer b4737336e5 Added info about Google Cloud Storage (#3056) 2016-06-02 10:06:07 -07:00
Vadim Ogievetsky 767190d5db Clear up confusing wording (#3052)
There is no such thing as a "Java aggregator" in Druid from a user's point of view, there are just native aggregator that happen to be implemented in Java.
2016-06-01 15:41:50 -07:00
Gian Merlino cd5c5419bb Make docs deploying better. (#3040)
- Make redirects for old links based on _redirects.json
- Replace #{DRUIDVERSION} tokens in docs with current version
- Allow origins named something other than "origin"
- Can use either s3cmd or awscli, depending on availability
2016-05-31 15:34:58 -07:00
David Lim f6c39cc844 Kafka task minimum message time (#3035)
* add KafkaIndexTask support for minimumMessageTime

* add Kafka supervisor support for lateMessageRejectionPeriod
2016-05-31 11:37:00 -07:00
scusjs ebb6831770 rm , of jobProperties. jackson can not parse it (#3012) 2016-05-26 09:46:33 -07:00
Charles Allen 245077b47f Fix formatting in lookups-cached-global.md (#3009) 2016-05-24 17:28:39 -07:00
Charles Allen c738c0e1cd Silly Typo in docs 2016-05-24 13:31:58 -07:00
Charles Allen 8024b915e2 [QTL] Implement LookupExtractorFactory of namespaced lookup (#2926)
* support LookupReferencesManager registration of namespaced lookup and eliminate static configurations for lookup from namespecd lookup extensions

- druid-namespace-lookup and druid-kafka-extraction-namespace are modified
- However, druid-namespace-lookup still has configuration about ON/OFF
  HEAP cache manager selection, which is not namespace wide
  configuration but node wide configuration as multiple namespace shares
  the same cache manager

* update KafkaExtractionNamespaceTest to reflect argument signature changes

* Add more synchronization functionality to NamespaceLookupExtractorFactory

* Remove old way of using extraction namespaces

* resolve compile error by supporting LookupIntrospectHandler

* Remove kafka lookups

* Remove unused stuff

* Fix start and stop behavior to be consistent with new javadocs

* Remove unused strings

* Add timeout option

* Address comments on configurations and improve docs

* Add more options and update hash key and replaces

* Move monitoring to the overriding classes

* Add better start/stop logging

* Remove old docs about namespace names

* Fix bad comma

* Add `@JsonIgnore` to lookup factory

* Address code review comments

* Remove ExtractionNamespace from module json registration

* Fix problems with naming and initialization. Add tests

* Optimize imports / reformat

* Fix future not being properly cancelled on failed initial scheduling

* Fix delete returns

* Add more docs about whole introspection

* Add `/version` introspection point for lookups

* Add more tests and address comments

* Add StaticMap extraction namespace for testing. Also add a bunch of tests

* Move cache system property to `druid.lookup.namespace.cache.type`

* Make VERSION lower case

* Change poll period to 0ms  for StaticMap

* Move cache key to bytebuffer

* Change hashCode and equals on static map extraction fn

* Add more comments on StaticMap

* Address comments

* Make scheduleAndWait use a latch

* Sanity renames and fix imports

* Remove extra info in docs

* Fix review comments

* Strengthen failure on start from warn to error

* Address comments

* Rename namespace-lookup to lookups-cached-global

* Fix injective mis-naming
* Also add serde test
2016-05-24 10:56:40 -07:00
Nishant 0ac1b27d53 Allow manually setting of shutoffTime for EventReceiverFirehose (#2803)
* Allow dynamically setting of shutoffTime for EventReceiverFirehose

Allow dynamically setting shutoffTime for EventReceiverFirehose

review comments and tests

* shut down exec on close
2016-05-24 07:24:00 -07:00
Nishant dea4391a49 fix broken links (#3003) 2016-05-23 06:38:21 -07:00
Fangjin Yang 00de26c76a fix extensions docs (#2995)
* fix extensions docs

* fix mistakes
2016-05-19 14:01:06 -07:00
Charles Allen eaaad01de7 [QTL] Datasource as lookupTier (#2955)
* Datasource as lookup tier
* Adds an option to let indexing service tasks pull their lookup tier from the datasource they are working for.

* Fix bad docs for lookups lookupTier

* Add Datasource name holder

* Move task and datasource to be pulled from Task file

* Make LookupModule pull from bound dataSource

* Fix test

* Fix code style on imports

* Fix formatting

* Make naming better

* Address code comments about naming
2016-05-17 15:44:42 -07:00
Shekhar Gulati c41bfe50d0 Using quotes around the cp (#2934) 2016-05-16 15:16:48 -07:00
Parag Jain e3ea842cd3 add available query granularity strings (#2960) 2016-05-12 18:49:31 -07:00
Joe Pettersson 2288c78395 chore_fix-quickstart-docs (#2946)
Fixes a small grammatical error in the `./docs/content/tutorials/quickstart.md` whereby a sentence didn't make sense
2016-05-10 09:52:24 -07:00
Slim 45b2e65d75 [QTL] adding listDelimiter to lookup parser spec (#2941)
* adding listDelimiter to lookup parser spec

* cleaning code
2016-05-10 15:41:16 +05:30
Gian Merlino b8af84d1fc Update tutorials to tranquility v0.8.0. (#2937) 2016-05-09 11:50:37 -07:00
Gian Merlino fffa9c8265 Fix flattenSpec docs, "nested" should be "path". (#2924) 2016-05-05 08:59:41 -07:00
David Lim b489f63698 Supervisor for KafkaIndexTask (#2656)
* supervisor for kafka indexing tasks

* cr changes
2016-05-04 23:13:13 -07:00
Charles Allen 44e52acfc0 Link up metrics configuration to what they mean (#2921) 2016-05-04 10:30:02 -07:00
Himanshu 8e2742b7e8 adding QueryGranularity to segment metadata and optionally expose same from segmentMetadata query (#2873) 2016-05-03 11:31:10 -07:00
Navis Ryu 45a3a26ef7 Add more math functions (#2822)
* Add more math functions

* added function list
2016-05-03 10:55:13 -07:00
Gian Merlino e680665f1c Fix Avro parseSpec example, "type" should be "format". (#2918) 2016-05-03 09:22:43 -07:00
Himanshu 6c5bf91f9a publish metrics numJettyConns to see how number of active jetty connections change over time (#2839)
this can be compared with numer of active queries to see if requests are waiting in jetty queue
2016-05-02 14:08:25 -07:00
Charles Allen 6b957aa072 [QTL] Make URI Exctraction Namespace take more sane arguments (#2738)
* Make URI Exctraction Namespace take more sane arguments
* Fixes https://github.com/druid-io/druid/issues/2669

* Update docs

* Rename error message

* Undo overzealous deletion of docs

* Explain caching mechanism a bit more in docs
2016-05-02 12:54:34 -07:00
Charles Allen 54b717bdc3 [QTL] Move kafka-extraction-namespace to the Lookup framework. (#2800)
* Move kafka-extraction-namespace to the Lookup framework.

* Address comments

* Fix missing kafka introspection

* Fix tests to be less racy

* Make testing a bit more leniant

* Make tests even more forgiving

* Add comments to kafka lookup cache method

* Move startStopLock to just use started

* Make start() and stop() idempotent

* Forgot to update test after last change, test now accounts for idempotency

* Add extra idempotency on stop check

* Add more descriptive docs of behavior
2016-05-02 09:45:13 -07:00
michaelschiff 2203a812bc statsd-emitter (#2410) 2016-04-28 18:41:02 -07:00
David Lim 890bdb543d doc fixes (#2897) 2016-04-28 15:34:58 -07:00
Slim 58510d826b fix emit wait time (#2869) 2016-04-26 17:07:03 -07:00
Slim 55785267e4 postAgg filedName must match name of AGG (#2874) 2016-04-22 11:11:54 -07:00
binlijin 9151099e08 add document for druid.segmentCache.numBootstrapThreads (#2872) 2016-04-22 12:06:08 +08:00
Himanshu 3cfd9c64c9 make singleThreaded groupBy query config overridable at query time (#2828)
* make isSingleThreaded groupBy query processing overridable at query time

* refactor code in GroupByMergedQueryRunner to make processing of single threaded and parallel merging of runners consistent
2016-04-21 17:12:58 -07:00
Slim 984a518c9f Merge pull request #2734 from b-slim/LookupIntrospection2
[QTL][Lookup] adding introspection endpoint
2016-04-21 12:15:57 -05:00
Gian Merlino c74391e54c JavaScript: Ability to disable. (#2853)
Fixes #2852.
2016-04-21 09:43:15 -05:00
Nishant dbf63f738f Add ability to filter segments for specific dataSources on broker without creating tiers (#2848)
* Add back FilteredServerView removed in a32906c7fd to reduce memory usage using watched tiers.

* Add functionality to specify "druid.broker.segment.watchedDataSources"
2016-04-19 10:10:06 -07:00
Gaurav Kumar f5822faca3 Fixed wrong parseSpec in Avro Hadoop Parser (#2846)
`parseSpec` should contain `format` instead of `type`. It was wrongly defaulting to `tsv`
2016-04-16 11:34:54 -07:00
du00cs 639d0630b8 jackson conflict workaround in hadooop ingestio & parquet extension coordinate update (#2817) 2016-04-13 14:20:33 -07:00
Fangjin Yang 0c4a42bb6f change toc entry (#2834) 2016-04-13 13:45:07 -07:00
Gian Merlino e320d13385 Fix various broken links in the docs. (#2833) 2016-04-13 13:30:01 -07:00
Gian Merlino 725ee1401d Update tranquility version in the docs. (#2832) 2016-04-13 11:33:59 -07:00
Gian Merlino aa25cc1f68 Fix up Kafka tutorial (#2831)
1) Remove extraneous section
2) Remove -SNAPSHOT version
2016-04-13 11:33:45 -07:00
Fangjin Yang abd951df1a Document how to use roaring bitmaps (#2824)
* Document how to use roaring bitmaps

This fixes #2408.
While not all indexSpec properties are explained, it does explain how roaring bitmaps can be turned on.

* fix

* fix

* fix

* fix
2016-04-12 19:28:02 -07:00
Charles Allen ed5377465a add AirBnB Caravel to list of libraries (#2719) 2016-04-12 12:53:50 -07:00
Sébastien Launay 37d2ab623e Merge pull request #2815 from slaunay/documentation/hadoop-classpath-issue-fix-with-configuration
Doc for mapreduce.job.user.classpath.first=true
2016-04-12 10:51:51 -07:00
Nishant deb6ecf919 handle review comments for PR 2784
https://github.com/druid-io/druid/pull/2784#discussion_r59062021
2016-04-12 21:52:00 +05:30
Fangjin Yang bd6bd34cd8 Merge pull request #2090 from himanshug/math_exp
math expression support
2016-04-11 21:36:17 -07:00
Fangjin Yang 234125878a Merge pull request #2808 from metamx/moveLookupSaveStateConfigDocs
Move lookup config doc to proper location
2016-04-08 13:50:42 -06:00
Himanshu Gupta 308211cc18 math expression language with parser/lexer generated using ANTLR 2016-04-08 11:40:29 -05:00
Himanshu Gupta 36ccfbd20e math expression language with hand written parser/lexer 2016-04-08 11:40:29 -05:00
Charles Allen 2b99f717e4 Move lookup config doc to proper location 2016-04-08 08:15:38 -07:00
Nishant edd74f2b67 Allow Lite DataSegment Announcements
separate config for each skipping dimensions, metrics and loadSpec

Add test

fix test comment

Add docs
2016-04-07 18:24:12 +05:30
Charles Allen f915a59138 Merge pull request #2691 from metamx/lookupExtrFn
Add ExtractionFn to LookupExtractor bridge
2016-04-06 09:13:08 -07:00
jon-wei 0e481d6f93 Allow filters to use extraction functions 2016-04-05 13:24:56 -07:00
Fangjin Yang eea7a47870 Merge pull request #2576 from navis/paging-from-next
Add option for select query to get next page without modifying returned paging identifiers
2016-04-01 13:50:36 -07:00
Fangjin Yang 4eb5a2c4f1 Merge pull request #2715 from navis/stringformat-null-handling
stringFormat extractionFn should be able to return null on null values (Fix for #2706)
2016-04-01 13:45:28 -07:00
navis.ryu 077522a46f stringFormat extractionFn should be able to return null on null values (Fix for #2706) 2016-04-01 13:40:56 +09:00
navis.ryu 29bb00535b Add option for select query to get next page without modifying returned paging identifiers 2016-04-01 09:03:03 +09:00
fjy 14dbc431ef clean up for extensions docs 2016-03-30 17:14:58 -07:00
Fangjin Yang a8b28879f1 Merge pull request #2369 from du00cs/master
[Feature] Extension: Offline Ingestion with limited Parquet Support
2016-03-29 23:19:35 -07:00
Fangjin Yang 23a8830bc2 Merge pull request #2757 from druid-io/fix-conf
Update libraries.md
2016-03-29 21:32:01 -07:00
DuNinglin [杜宁林] 0f67ff7dfb reoganize code folder according to recent upstream folder changes, seperate it from avro code and take it into extensions-conrib. docs rewite too 2016-03-30 11:21:41 +08:00
Gian Merlino 1853f36e9f More consistent empty-set filtering behavior on multi-value columns.
The behavior is now that filters on "null" will match rows with no
values. The behavior in the past was inconsistent; sometimes these
filters would match and sometimes they wouldn't.

Adds tests for this behavior to SelectorFilterTest and
BoundFilterTest, for query-level filters and filtered aggregates.

Fixes #2750.
2016-03-29 15:32:13 -07:00
r4ruchir 4bff008d65 Update libraries.md
Adding embedded-druid information in helper libraries
2016-03-29 15:16:36 -07:00
Fangjin Yang 1e02eeab13 Merge pull request #2683 from metamx/default_retry
Better defaults for Retry policy for task actions
2016-03-29 08:02:59 -07:00
fjy c418a55638 cleanup distinct count agg 2016-03-28 17:29:41 -07:00
Fangjin Yang 62c1dc7a09 Merge pull request #2602 from binlijin/distinctcount
implement special distinctcount
2016-03-28 17:20:17 -07:00
Fangjin Yang 9cb197adec Merge pull request #2722 from himanshug/fix_hadoop_jar_upload
config to explicitly specify classpath for hadoop container during hadoop ingestion
2016-03-28 14:49:03 -07:00
Charles Allen 4764e86409 Add docs for RegisteredDimensionExtractionFn 2016-03-28 13:27:49 -07:00
Gian Merlino dbdfcd2443 Fix extension reference in Kafka namespaced lookup docs.
The reference to io.druid.extensions:kafka-extraction-namespace is wrong (should
be druid-kafka-extraction-namespace) and unnecessary (the extension id is written
at the top of the doc file).
2016-03-28 09:23:24 -07:00
Fangjin Yang a0216dcf7d Merge pull request #2735 from metamx/fixlookupDocs
Move lookup docs that are in druid-proper back into lookups.md
2016-03-26 15:38:48 -07:00
Charles Allen ab324e4ac0 Move lookup docs that are in druid-proper back into lookups.md 2016-03-25 10:46:50 -07:00
Gian Merlino 6d18382fb2 Fix broken link in datasketches-aggregators.md. 2016-03-25 09:32:40 -07:00
Himanshu Gupta e78a469fb7 UTs for ExtensionsConfig 2016-03-25 10:51:28 -05:00
Himanshu Gupta 004b00bb96 config to explicitly specify classpath for hadoop container during hadoop ingestion 2016-03-25 10:51:28 -05:00
Bingkun Guo 0fa04305a6 refine description for mergeBytesLimit 2016-03-24 13:17:24 -05:00
binlijin 2729efca71 implement special distinctcount 2016-03-24 11:11:11 +08:00