Commit Graph

1183 Commits

Author SHA1 Message Date
Jonathan Wei a42ccb6d19 Support filtering on long columns (including __time) (#3180)
* Support filtering on __time column

* Rename DruidPredicate

* Add docs for ValueMatcherFactory, add comment on getColumnCapabilities

* Combine ValueMatcherFactory predicate methods to accept DruidCompositePredicate

* Address PR comments (support filter on all long columns)

* Use predicate factory instead of composite predicate

* Address PR comments

* Lazily initialize long handling in selector/in filter

* Move long value parsing from InFilter to InDimFilter, make long value parsing thread-safe

* Add multithreaded selector/in filter test

* Fix non-final lock object in SelectorDimFilter
2016-07-20 17:08:49 -07:00
Navis Ryu cd7337fc8a Calculate max split size based on numMapTask in DatasourceInputFormat (#2882)
* Calculate max split size based on numMapTask

* updated docs & fixed possible ArithmeticException
2016-07-20 16:53:51 -07:00
Gian Merlino dd4ec751d0 Update docs for working with Hadoop dependencies. (#3252)
- Attempt to make things clearer in general
- Point out that HDFS deep storage and MR jobs don't use the same loading mechanism
- Recommend using mapreduce.job.classloader = true when possible
2016-07-18 07:47:58 -05:00
Himanshu 3f82108d15 optionally enable coordinator auto kill tasks on all dataSources via dynamic config (#3250) 2016-07-17 18:47:52 -07:00
Gian Merlino 90f5d8cd17 Fix path in cluster.md. (#3253) 2016-07-17 08:30:20 -07:00
Gian Merlino 6a03a0cfec Fix ingest/persist/backPressure docs. (#3243) 2016-07-13 21:56:28 -07:00
Gian Merlino 3ab4a4efbc Fix formatting in granularities doc. (#3229) 2016-07-08 09:29:58 -07:00
Gian Merlino ea03906fcf Configurable compressRunOnSerialization for Roaring bitmaps. (#3228)
Defaults to true, which is a change in behavior (this used to be false and unconfigurable).
2016-07-08 10:24:19 +05:30
Charles Allen 3f1681c16c Caffeine cache extension (#3028)
* Initial commit of caffeine cache

* Address code comments

* Move and fixup README.md a bit

* Improve caffeine readme information

* Cleanup caffeine pom

* Address review comments

* Bump caffeine to 2.3.1

* Bump druid version to 0.9.2-SNAPSHOT

* Make test not fail randomly.

See https://github.com/ben-manes/caffeine/pull/93#issuecomment-227617998 for an explanation

* Fix distribution and documentation

* Add caffeine to extensions.md

* Fix links in extensions.md

* Lexicographic
2016-07-06 15:42:54 -07:00
Gian Merlino b8a4f4ea7b DumpSegment: Add --dump bitmaps option. (#3221)
Also make --dump metadata respect --column.
2016-07-06 12:42:50 -07:00
Gian Merlino fdc7e88a7d Allow queries with no aggregators. (#3216)
This is actually reasonable for a groupBy or lexicographic topNs that is
being used to do a "COUNT DISTINCT" kind of query. No aggregators are
needed for that query, and including a dummy aggregator wastes 8 bytes
per row.

It's kind of silly for timeseries, but why not.
2016-07-06 20:38:54 +05:30
Fangjin Yang 8eeae2e844 remove bad docs on setting up clusters (#3188) 2016-07-01 15:41:40 -05:00
Parag Jain 99844dfeb5 remove need for tmp extensions dir (#3211)
correct lib path relative to base distribution dir
2016-07-01 12:55:57 -07:00
Charles Allen 8b7d9750ee Update extension docs for global lookup module (#3206) 2016-06-29 12:51:52 -07:00
David Lim b24425a280 update docs with new behavior (#3200) 2016-06-28 16:17:04 -07:00
jaehong choi efbcbf5315 Support alphanumeric sort in search query (#2593)
* support alphanumeric sort in search query

* address a comment about handling equals() and hashCode()

* address comments

* add Ut for string comparators

* address a comment about space indentations.
2016-06-28 15:06:18 -07:00
Gian Merlino 4cc39b2ee7 Alternative groupBy strategy. (#2998)
This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.

Both of these are described in more detail in #2987.

There are two goals of this patch:

1. Make it possible for historical/realtime nodes to return larger groupBy
   result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
   columns, avoiding materialization.

This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
2016-06-24 18:06:09 -07:00
michaelschiff 66d8ad36d7 adds new coordinator metrics 'segment/unavailable/count' and (#3176)
'segment/underReplicated/count' (#3173)
2016-06-23 14:53:15 -07:00
Gian Merlino da660bb592 DumpSegment tool. (#3182)
Fixes #2723.
2016-06-23 14:37:50 -07:00
Dave Li 12be1c0a4b Add bucket extraction function (#3033)
* add bucket extraction function

* add doc and header

* updated doc and test
2016-06-17 09:24:27 -07:00
linbo.jin 8c76fe7b97 docs: change OR to AND inside query docs about multi-value dims (#3162)
* docs: replace OR by AND inside topnquery docs about multi value dimensions

* docs: replace OR by AND inside groupby docs about multi value dimensions
2016-06-17 08:54:18 -07:00
Fangjin Yang 07288c8fc0 update compares some more (#3158)
* update compares some more

* fix

* fix

* fix
2016-06-16 18:34:43 -07:00
Gian Merlino c12712e8b8 Move "libraries.md" out of docs, onto the main site. (#3159) 2016-06-16 18:14:35 -07:00
Fangjin Yang 6c2fd75e46 update vs spark doc (#3116)
* update vs spark doc

* update based on comments
2016-06-15 10:30:19 -07:00
Gian Merlino 7da4a283a9 Add missing layout: toc to TOC. (#3144) 2016-06-14 10:48:05 -07:00
Gian Merlino dc2bf9efa5 Update absolute TOC links. (#3138)
See druid-io/druid-io.github.io#286.
2016-06-13 17:57:52 -07:00
Gian Merlino 3b3e772748 Add --no-default-remote-repositories flag to pull-deps. (#3120) 2016-06-13 17:01:18 +05:30
michaelschiff 7294ea87c3 link to statsd metrics emitter docs from development/extensions.html doc page (#3125) 2016-06-10 16:27:16 -07:00
Gian Merlino 5321ba3e8f Switch to absolute TOC (#3110)
Depends on druid-io/druid-io.github.io#282
2016-06-07 21:39:34 -07:00
Jonathan Wei c5dbf364e3 Fix JSON flatten docs, add link to path expression tester (#3105) 2016-06-07 14:39:57 -07:00
Kirill Kozlov 4ab675e863 Fix command name in example (#3088) 2016-06-07 10:44:27 -07:00
Kirill Kozlov 9f93be448e Fix logical operator in example (#3093) 2016-06-07 10:44:18 -07:00
Gian Merlino 99ee3f4dc3 Fixups, clarifications to lookup docs. (#3060) 2016-06-07 10:43:35 -07:00
Charles Allen fa41a6466a Cleanup the base lookup cluster wide config docs (#3061)
* Cleanup the base lookup cluster wide config docs

* Add better examples in lookups-cached-global.md

* Use actual valid stock lookups

* Fixed maps with :

* Add mix of lookups

* Better examples in extension

* Remove unneeded namespace requirement

* Add extra line space

* Add link to lookup tiers

* Renamed header
2016-06-07 10:42:41 -07:00
Charles Allen 8cac710546 Async lookups-cached-global by default (#3074)
* Async lookups-cached-global by default
* Also better lookup docs

* Fix test timeouts

* Fix timing of deserialized test

* Fix problem with 0 wait failing immediately
2016-06-03 15:58:10 -05:00
David Lim a2290a8f05 support seamless config changes (#3051) 2016-06-03 13:50:19 -07:00
Gian Merlino 2db5f49f35 Fix JavaScriptConfig. (#3062) 2016-06-02 23:59:00 -07:00
Gian Merlino 603fbbcc20 Fix docs for "contains" search spec. (#3066) 2016-06-02 19:03:40 -07:00
Vadim Ogievetsky 13c267bfee Added new line for site formatting (#3059) 2016-06-02 11:36:45 -07:00
Parag Jain 44237e25d9 fix duration format and number format (#3057) 2016-06-02 10:09:21 -07:00
Erik Dubbelboer b4737336e5 Added info about Google Cloud Storage (#3056) 2016-06-02 10:06:07 -07:00
Vadim Ogievetsky 767190d5db Clear up confusing wording (#3052)
There is no such thing as a "Java aggregator" in Druid from a user's point of view, there are just native aggregator that happen to be implemented in Java.
2016-06-01 15:41:50 -07:00
Gian Merlino cd5c5419bb Make docs deploying better. (#3040)
- Make redirects for old links based on _redirects.json
- Replace #{DRUIDVERSION} tokens in docs with current version
- Allow origins named something other than "origin"
- Can use either s3cmd or awscli, depending on availability
2016-05-31 15:34:58 -07:00
David Lim f6c39cc844 Kafka task minimum message time (#3035)
* add KafkaIndexTask support for minimumMessageTime

* add Kafka supervisor support for lateMessageRejectionPeriod
2016-05-31 11:37:00 -07:00
scusjs ebb6831770 rm , of jobProperties. jackson can not parse it (#3012) 2016-05-26 09:46:33 -07:00
Charles Allen 245077b47f Fix formatting in lookups-cached-global.md (#3009) 2016-05-24 17:28:39 -07:00
Charles Allen c738c0e1cd Silly Typo in docs 2016-05-24 13:31:58 -07:00
Charles Allen 8024b915e2 [QTL] Implement LookupExtractorFactory of namespaced lookup (#2926)
* support LookupReferencesManager registration of namespaced lookup and eliminate static configurations for lookup from namespecd lookup extensions

- druid-namespace-lookup and druid-kafka-extraction-namespace are modified
- However, druid-namespace-lookup still has configuration about ON/OFF
  HEAP cache manager selection, which is not namespace wide
  configuration but node wide configuration as multiple namespace shares
  the same cache manager

* update KafkaExtractionNamespaceTest to reflect argument signature changes

* Add more synchronization functionality to NamespaceLookupExtractorFactory

* Remove old way of using extraction namespaces

* resolve compile error by supporting LookupIntrospectHandler

* Remove kafka lookups

* Remove unused stuff

* Fix start and stop behavior to be consistent with new javadocs

* Remove unused strings

* Add timeout option

* Address comments on configurations and improve docs

* Add more options and update hash key and replaces

* Move monitoring to the overriding classes

* Add better start/stop logging

* Remove old docs about namespace names

* Fix bad comma

* Add `@JsonIgnore` to lookup factory

* Address code review comments

* Remove ExtractionNamespace from module json registration

* Fix problems with naming and initialization. Add tests

* Optimize imports / reformat

* Fix future not being properly cancelled on failed initial scheduling

* Fix delete returns

* Add more docs about whole introspection

* Add `/version` introspection point for lookups

* Add more tests and address comments

* Add StaticMap extraction namespace for testing. Also add a bunch of tests

* Move cache system property to `druid.lookup.namespace.cache.type`

* Make VERSION lower case

* Change poll period to 0ms  for StaticMap

* Move cache key to bytebuffer

* Change hashCode and equals on static map extraction fn

* Add more comments on StaticMap

* Address comments

* Make scheduleAndWait use a latch

* Sanity renames and fix imports

* Remove extra info in docs

* Fix review comments

* Strengthen failure on start from warn to error

* Address comments

* Rename namespace-lookup to lookups-cached-global

* Fix injective mis-naming
* Also add serde test
2016-05-24 10:56:40 -07:00
Nishant 0ac1b27d53 Allow manually setting of shutoffTime for EventReceiverFirehose (#2803)
* Allow dynamically setting of shutoffTime for EventReceiverFirehose

Allow dynamically setting shutoffTime for EventReceiverFirehose

review comments and tests

* shut down exec on close
2016-05-24 07:24:00 -07:00
Nishant dea4391a49 fix broken links (#3003) 2016-05-23 06:38:21 -07:00