1424 Commits

Author SHA1 Message Date
DaimonPl
a2875a4d91 pre-computed HLL support for hyperUnique aggregator (#3909) 2017-02-13 15:26:20 -08:00
Himanshu
9dfcf0763a disable javascript execution by default (#3818) 2017-02-13 15:11:18 -08:00
Himanshu
8cf7ad1e3a druid.coordinator.asOverlord.enabled flag at coordinator to make it an overlord too (#3711) 2017-02-13 15:03:59 -08:00
Jonathan Wei
ca2b04f0fd Add long/float ColumnSelectorStrategy implementations (#3838)
* Add long/float ColumnSelectorStrategy implementations

* Address PR comments

* Add String strategy with internal dictionary to V2 groupby, remove dict from numeric wrapping selectors, more tests

* PR comments

* Use BaseSingleValueDimensionSelector for long/float wrapping

* remove unused import

* Address PR comments

* PR comments

* PR comments

* More PR comments

* Fix failing calcite histogram subquery tests

* ScanQuery test and comment about isInputRaw

* Add outputType to extractionDimensionSpec, tweak SQL tests

* Fix limit spec optimization for numerics

* Add cardinality sanity checks to TopN

* Fix import from merge

* Add tests for filtered dimension spec outputType

* Address PR comments

* Allow filtered dimspecs on numerics

* More comments
2017-02-08 20:39:29 -08:00
Erik Dubbelboer
2aa2fa57b5 Simple doc fix (#3907) 2017-02-06 15:52:17 +05:30
Darío
8f4394ca49 Update segments.md (#3904) 2017-02-03 10:31:14 -06:00
Nishant Bangarwa
a457cded28 Druid Extension to enable Authentication using Kerberos. (#3853)
* Add extension for supporting kerberos security

- This PR adds an extension for supporting druid authentication via
Kerberos.
- Working on the docs.

* Add docs

* review comments

* more review comments

* Block all paths by default

* more review comments - use proper Oid

* Allow extensions to override httpclient for integration tests

* Add kerberos lock to prevent multithreaded issues.

* review comment - remove enabled flag and fix router injection

* Add Cookie Handling and more detailed docs

* review comment - rename DruidKerberosConfig -> AuthKerberosConfig

* review comments

* fix travis failure on jdk7
2017-02-02 14:55:21 -06:00
Jonathan Wei
182261f713 Allow configurable temp directory for query processing (#3893) 2017-02-02 10:22:28 -08:00
Gian Merlino
151ff6d064 flattenSpec: Document that "expr" is ignored for type "root". (#3884) 2017-01-31 10:27:20 -08:00
Gian Merlino
ac84a3e011 SQL: Add resolution parameter, fix filtering bug with APPROX_QUANTILE (#3868)
* SQL: Add resolution parameter to quantile agg, rename to APPROX_QUANTILE.

* Fix bug with re-use of filtered approximate histogram aggregators.

Also add APPROX_QUANTILE tests for filtering and running on complex columns.
Includes some slight refactoring to allow tests to make DruidTables that
include complex columns.

* Remove unused import
2017-01-25 18:39:26 -08:00
Niketh Sabbineni
2b8d3c102b Remove throttling on drop segments (#3736)
* Remove throttling on drop

* Throttle loadqueuepeon segment change requests to ZK

* Make initial delay configurable, add docs, shutdown gracefully

* Make loadqueuepeon repeat delay configurable
2017-01-20 10:02:19 -08:00
Gian Merlino
d51f5e058d SQL: Ditch CalciteConnection layer and add DruidMeta, extension aggregators. (#3852)
* SQL: Ditch CalciteConnection layer and add DruidMeta, extension aggregators.

Switched from CalciteConnection to Planner, bringing benefits:

- CalciteConnection's JDBC interface no longer sits between the SQL server
  (HTTP/Avatica) and Druid's query layer. Instead, the SQL servers can use
  Druid Sequence objects directly, reducing overhead in the query return path.

- Implemented our own Planner-based Avatica Meta, letting us control
  connection timeouts and connection / statement limits. The previous
  CalciteConnection-based implementation didn't have any limits or timeouts.

- The Planner interface lets us override the operator table, opening up
  SQL language extensions. This patch includes two: APPROX_COUNT_DISTINCT
  in core, and a QUANTILE aggregator in the druid-histogram extension.

Also:

- Added INFORMATION_SCHEMA metadata schema.

- Added tests for Unicode literals and escapes.

* Verify statement is actually open before closing it.

* More detailed INFORMATION_SCHEMA docs.
2017-01-19 16:32:20 -08:00
kaijianding
33ae9dd485 streaming version of select query (#3307)
* streaming version of select query

* use columns instead of dimensions and metrics;prepare for valueVector;remove granularity

* respect query limit within historical

* use constant

* fix thread name corrupted bug when using jetty qtp thread rather than processing thread while working with SpecificSegmentQueryRunner

* add some test for scan query

* add scan query document

* fix merge conflicts

* add compactedList resultFormat, this format is better for json ser/der

* respect query timeout

* respect query limit on broker

* use static consts and remove unused code
2017-01-19 16:09:53 -06:00
David Lim
ff52581bd3 IndexTask improvements (#3611)
* index task improvements

* code review changes

* add null check
2017-01-18 14:24:37 -08:00
Fokko Driesprong
31bea380eb Updated Apache Zookeeper to the latest stable version (#3841) 2017-01-12 13:39:29 -08:00
Gian Merlino
e86859b228 SQL support for nested groupBys. (#3806)
* SQL support for nested groupBys.

Allows, for example, doing exact count distinct by writing:

  SELECT COUNT(*) FROM (SELECT DISTINCT col FROM druid.foo)

Contrast with approximate count distinct, which is:

  SELECT COUNT(DISTINCT col) FROM druid.foo

* Add deeply-nested groupBy docs, tests, and maxQueryCount config.

* Extract magic constants into statics.

* Rework rules to put preconditions in the "matches" method.
2017-01-11 18:32:53 -08:00
Gian Merlino
76620615a1 Properly respect the enableAvatica and enableJsonOverHttp options. (#3834) 2017-01-11 14:43:34 -06:00
Jihoon Son
c099977a5b Add an option to SearchQuery to choose a search query execution strategy (#3792)
* Add an option to SearchQuery to choose a search query execution strategy.

Supported strategies are
1) Index-only query execution
2) Cursor-based scan
3) Auto: choose an efficient strategy for a given query

* Add SearchStrategy and SearchQueryExecutor

* Address comments

* Rename strategies and set UseIndexesStrategy as the default strategy

* Add a cost-based planner for auto strategy

* Add document

* Fix code style

* apply code style

* apply comments
2017-01-10 18:04:20 -08:00
Vinh Tran
dddeae813a Update caching.md typo (#3824)
* Update caching.md

Typo of Command vs Comma

* Update index.md

Fixing `Command` typo
2017-01-06 12:14:07 -08:00
Yuusaku Taniguchi
02519d5b64 Exhibitor Support (#3664)
* allow JsonConfigTesterBase to treat the fields of collections

* [Feature] Exhibitor Support (#3664)

This patch provides the integration of Druid & Netflix Exhibitor. Druid
currently use Apache Curator as ZooKeeper client. Curator can be
integrated with Exhibitor to achieve a live/updating list of the
ZooKeeper ensemble. This patch enables Druid to use this features.
2017-01-02 09:15:36 -08:00
Himanshu
4ca3b7f1e4 overlord helpers framework and tasklog auto cleanup (#3677)
* overlord helpers framework and tasklog auto cleanup

* review comment changes

* further review comments addressed
2016-12-21 15:18:55 -08:00
Nishant
f576a0ff14 Contrib Extension for Ambari Metrics Emitter (#3767)
* Contrib Extension for Ambari Metrics Emitter

extension to enable druid to send metrics to ambari metrics server
(https://cwiki.apache.org/confluence/display/AMBARI/Metrics)

review comments

switch to public repo

* review comments

* add docs

* fix pom version

* Add link for doc page in extensions.md

* remove unused imports

* review comments

review comments

remove unused dependency

review comment
2016-12-19 11:12:47 -08:00
Nishant
35160e5595 Add metrics for Query Count statistics (#3470)
* Add metrics for Query Count statistics

This PR adds a new metrics monitor “QueryCountStatsMonitor” which emits
three new metrics -
1) query/success/count - number of successful queries
2) query/failed/count - number of failed queries
3) query/interrupted/count - number of interrupted/timedout queries

fix bindings

* make fields final

* fix imports

* AsyncQueryForwardingServlet implement QueryStatsProvider

* remove unused import
2016-12-19 09:47:58 -08:00
David Lim
8eee259629 add documentation on segments generated (#3785) 2016-12-19 09:41:47 -08:00
Dongkyu Hwangbo
da007ca3c2 Replace caravel with superset (#3780) 2016-12-16 20:47:52 -08:00
Gian Merlino
dd63f54325 Built-in SQL. (#3682) 2016-12-16 17:15:59 -08:00
Jonathan Wei
2bfcc8a592 First and Last Aggregator (#3566)
* add first and last aggregator

* add test and fix

* moving around

* separate aggregator valueType

* address PR comment

* add finalize inner query and adjust v1 inner indexing

* better test and fixes

* java-util import fixes

* PR comments

* Add first/last aggs to ITWikipediaQueryTest
2016-12-16 15:26:40 -08:00
Nishant
8cfcb95fbc Add Filtered and Composing request loggers (#3469)
* Add Filtered and Composing request loggers

Add Filtered and Composite Request loggers
- enables users to filter request logs for slow queries.

fix test

* review comments

* review comment

* remove unused import
2016-12-16 11:18:32 -08:00
Himanshu
ed322a4beb remove size from default analysisTypes list for segmentMetadata query (#3773) 2016-12-13 18:01:21 -08:00
Ninglin Du
469ab21091 [Feature] Thrift support for realtime and batch ingestion (#3418)
* Thrift ingestion plugin

1. thrift binary is platform dependent, use scrooge to generate java files to avoid style check failure
2. stream and hadoop ingesion are both supported, input format can be sequence file and lzo thrift block file.
3. base64 and protocol aware

change header

* fix conlicts in pom
2016-12-13 10:05:15 -08:00
kaijianding
4be3eb0ce7 report message gap, source gap and sink count in RealtimePlumber (#3744)
* report message gap, source gap and sink count in RealtimePlumber

* report message gap, sink count in Appenderator

* add ingest/events/sourceGap in metrics.md

* remove source gap
2016-12-13 11:23:02 -06:00
Erik Dubbelboer
bb9e35e1af Add Greatest and Least post aggregations (#3567) 2016-12-07 17:58:23 -08:00
Gian Merlino
943982b7b0 Configurable HTTP compression. (#3759)
* Configurable HTTP compression.

* Call real-time nodes real-time processes in docs.
2016-12-07 17:40:39 -08:00
Himanshu
06d0ef9c6c allow and load extensions with absolute paths in druid.extensions.loadList (#3747) 2016-12-06 17:40:23 -08:00
Himanshu
45da7e48f1 groupBy sort results by (dimensions,timestamp) instead of (timestamp,dimension) (#3672)
* sortByDimsFirst flag for groupBy query

* Remove need for KeyType in Grouper<KeyType> to be Comparable<KeyType>

* fix review comments

* fix review comments regarding removing code duplication of dim/time comparison

* move comparator for KeyType object to KeySerdeFactory so that creation of comparator does not need KeySerde

* remove unnecessary system.out.println

* make access static var NATURAL_NULLS_FIRST directly

* further review comments addressing
2016-12-06 09:48:56 -08:00
Niketh Sabbineni
d904c79081 Normalized Cost Balancer (#3632)
* Normalized Cost Balancer

* Adding documentation and renaming to use diskNormalizedCostBalancer

* Remove balancer from the strings

* Update docs and include random cost balancer

* Fix checkstyle issues
2016-12-05 17:18:20 -08:00
Gian Merlino
353fee79dd Add "asMillis" option to "timeFormat" extractionFn. (#3733)
This is useful for chaining extractionFns that all want to treat time as millis,
such as having a javascript extractionFn after a timeFormat.
2016-12-02 13:45:16 -08:00
Gian Merlino
102375d9bb Add "strlen" extractionFn. (#3731) 2016-12-02 12:08:51 -08:00
Gian Merlino
4c5d10f8a3 Add DimFilterHavingSpec. (#3727)
* Add DimFilterHavingSpec.

* Add test for DimFilterHavingSpec with extractionFns.
2016-12-02 10:04:30 -08:00
Gian Merlino
e4465e63bd Fix ordering of sections on dimensionspecs.md. (#3722)
The Filtered and List DimensionSpecs were mixed in with the extraction functions.
2016-11-29 16:28:36 -08:00
Niketh Sabbineni
2640d170c3 Blacklist workers if they fail for too many times (#3643)
* Blacklist workers if they fail for too many times

* Adding documentation

* Changing to timeout to period and updating docs

* 1. Add configurable maxPercentageBlacklistWorkers
2. Rename variable

* Change maxPercentageBlacklistWorkers to double

* Remove thread.sleep
2016-11-29 12:38:56 +05:30
Erik Dubbelboer
9f7050e221 Fix some grammar and spelling mistakes (#3717) 2016-11-28 11:49:30 -08:00
Himanshu
7d37f675ba fix the documented property name for specifying avro reader schema (#3708) 2016-11-22 15:02:41 -08:00
Parag Jain
7ee6bb7410 option to reset offest automatically in case of OffsetOutOfRangeException (#3678)
* option to reset offset automatically in case of OffsetOutOfRangeException
if the next offset is less than the earliest available offset for that partition

* review comments

* refactoring

* refactor

* review comments
2016-11-21 16:29:46 -06:00
Jonathan Wei
7c63bee7f5 Add mapreduce.job.classloader.system.classes property to 'Other Hadoop Versions' docs (#3706) 2016-11-18 16:16:50 -08:00
Erik Dubbelboer
7d36f540e8 WIP: Add Google Storage support (#2458)
Also excludes the correct artifacts from #2741
2016-11-16 14:06:45 +05:30
Gian Merlino
607f64376c Update branch in deploy-docs.sh. (#3695) 2016-11-15 13:46:46 -08:00
Keuntae Park
094f5b851b Support Min/Max for Timestamp (#3299)
* Min/Max aggregator for Timestamp

* remove unused imports and method

* rebase and zip the test data

* add docs
2016-11-14 23:00:21 -08:00
Joan Viladrosa
2df98bcaa6 Fixed Missing commas in json example of Lookup (#3680) 2016-11-15 14:56:18 +09:00
Gian Merlino
bcd20441be Make buildV9Directly the default. (#3688) 2016-11-14 09:29:32 -08:00