Commit Graph

1434 Commits

Author SHA1 Message Date
Akash Dwivedi b43720c46d Correction in indexing-service configuration doc. (#4700) 2017-08-22 23:21:34 -05:00
Jonathan Wei 1bddfc089c Additional docs/log for direct memory usage (#4631)
* Additional docs/log for direct memory usage

* Tweak docs

* Doc rewording
2017-08-10 23:33:20 -07:00
Niketh Sabbineni eb0deba54a Fix NPE when locations are empty (#4667)
* Fix NPE when locations are empty

* Addressing comments
2017-08-10 23:31:28 -07:00
Yuewen Wang c821bc9a5a Implement "earlyMessageRejectionPeriod" config discussed in issue #4599 (#4607)
* Implement "earlyMessageRejectionPeriod" config discussed in issue #4599
    * implement the logics of this param
    * Added doc of this config
    * Added unit tests of it

* Update KafkaSupervisor.java

ameliorate comment

* fix format

* fix bug when rebasing
2017-08-11 09:12:08 +09:00
Peter Cunningham ede7cf9eef Added support for where clauses to JDBC lookups. (#4643)
* Added support for where clauses to filter lookup values on ingestion.

Added a filter field to the JDBC lookups that is used to generate a
where clause so that only rows matching the filter value will be
brought into Druid. Example being filter="SOMECOLUMN=1"

* Required changes based on code review.

* Required changes based on code review.

* Added support for where clauses to filter lookup values on ingestion.

Added a filter field to the JDBC lookups that is used to generate a
where clause so that only rows matching the filter value will be
brought into Druid. Example being filter="SOMECOLUMN=1"

* Updates based on code review, mainly formatting and small refactor of
the buildLookupQuery method.

* Fixed broken buildLookupQuery method

* Removed empty line.

* Updates per review comments
2017-08-09 10:47:46 -07:00
Jihoon Son d5606bc558 Passing lockTimeout as a parameter for TaskLockbox.lock() (#4549)
* Passing lockTimeout as a parameter for TaskLockbox.lock()

* Remove TIME_UNIT

* Fix tc fail

* Add taskLockTimeout to TaskContext

* Add caution
2017-08-08 18:21:07 -07:00
David Lim dd0b84e766 Fix bugs in RTR related to blacklisting, change default worker strategy (#4619)
* fix bugs in RTR related to blacklisting, change default worker strategy to equalDistribution

* code review and additional changes

* fix errorprone

* code review changes
2017-08-03 10:34:45 -07:00
Jihoon Son f3f2cd35e1 Array-based aggregation for groupBy query (#4576)
* Array-based aggregation

* Fix handling missing grouping key

* Handle invalid offset

* Fix compilation

* Add cardinality check

* Fix cardinality check

* Address comments

* Address comments

* Address comments

* Address comments

* Cleanup GroupByQueryEngineV2.process

* Change to Byte.SIZE

* Add flatMap
2017-08-03 20:04:54 +03:00
David Lim f4ba7a68ab fix missing column in caching documentation (#4612) 2017-07-28 13:53:04 -07:00
Roman Leventov 684cfbf889 Upgrade to server-metrics 0.5.0 (#4480)
* Upgrade to server-metrics 0.4.3

* Upgrade to 0.5.0

* Add CpuAcctDeltaMonitor description to docs
2017-07-26 08:56:00 -07:00
Gian Merlino d4ef0f6d94 Improved SQL support for floats and doubles. (#4598)
* Improved SQL support for floats and doubles.

- Use Druid FLOAT for SQL FLOAT, and Druid DOUBLE for SQL DOUBLE, REAL,
  and DECIMAL.
- Use float* aggregators when appropriate.
- Add tests involving both float and double columns.
- Adjust documentation accordingly.

* CR comments.

* Fix braces.
2017-07-25 13:54:44 -07:00
Himanshu ae6780f62a rolling upgrade order change to bring coordinator and overlord together (#4281)
* rolling upgrade order change to bring coordinator and overlord together

* mentioned merged Coordinator-Overlord in upgrade order doc

* revert autoscaling doc change

* auto scaling doc fix
2017-07-25 12:54:12 -05:00
Slim 71e7a4c054 Adding double colums supports (#4491)
* add double columns support

* Fix numbers and expected results in UTs

* adding float aggregators

* fix IT expected test results

* fix comments

* more fixes

* fix comp

* fix test

* refactor double and float aggregator factories

* fix

* fix UTs

* fix comments

* clean unused code

* fix more comments

* undo unnecessary changes

* fix null issue

* refactor TopNColumnSelectorStrategyFactory

* fix docs

* refactor NumericTopNColumnSelectorStrategy

* fix return

* fix comments

* handle the null case in DimesionIndexer

* more null fixing

* cosmetic changes
2017-07-20 10:14:14 +03:00
Roman Leventov b2865b7c7b Make possible to start Peon without DI loading of any querying-related stuff (#4516)
* Make QueryRunnerFactoryConglomerate injection lazy in TaskToolbox/TaskToolboxFactory

* Extract QueryablePeonModule and add druid.modules.excludeList config

* Typo
2017-07-12 13:18:25 -05:00
Jihoon Son cc20260078 Early publishing segments in the middle of data ingestion (#4238)
* Early publishing segments in the middle of data ingestion

* Remove unnecessary logs

* Address comments

* Refactoring the patch according to #4292 and address comments

* Set the total shard number of NumberedShardSpec to 0

* refactoring

* Address comments

* Fix tests

* Address comments

* Fix sync problem of committer and retry push only

* Fix doc

* Fix build failure

* Address comments

* Fix compilation failure

* Fix transient test failure
2017-07-10 22:35:36 -07:00
Gian Merlino 16817e408d SQL + Expressions = Best friends forever. (#4360)
* SQL + Expressions = Best friends forever.

- Use expressions as a projection layer for anything that can't be
  expressed using traditional Druid extractionFns. Sometimes they're
  embedded directly (like "expression" filters, builtin aggregators,
  or "expression" post-aggregators). Sometimes they're referenced
  through virtual columns (like dimensionSpecs, which can't innately
  reference functions of more than one column without the virtual
  column layer).
- Add many new functions and operators, taking advantage of the
  expression capability (see the querying/sql.md doc).
- Improve consistency of constant reduction and of casting by
  using Druid expressions for this instead of Calcite's RexExecutor.

* Fix casting bug, and other code review comments.

* Fix docs.
2017-07-07 08:48:26 -07:00
Parag Jain 6e2f78f552 TLS support (#4270) 2017-07-06 17:40:12 -07:00
Roman Leventov 6173570425 Add ExtensionsConfig.excludeModules (#4438)
* Add ExtensionsConfig.excludeModules

* Add branch

* Refactor Initialization.getFromExtensions()

* excludeModules -> moduleExcludeList

* Initialization.getFromExtensions() and getLoadedModules() should return Collection, not Set

* Fix doc
2017-06-28 14:01:31 -07:00
Gian Merlino 4c33d0a00f Add some new expression functions and macros. (#4442)
* Add some new expression functions and macros.

See misc/math-expr.md for the list of added functions, except for
"like", which previously existed but was not documented.

* Add easymock to datasketches tests.

* Add easymock to distinctcount tests.

* Add easymock to virtual-columns tests.

* Code review comments.

* Clean up code a bit.

* Add easymock to scan-query tests.

* Rework ExprMacros that have multiple impls.

* Improve test coverage.
2017-06-28 10:15:58 -07:00
Roman Leventov 2fa4b10145 More fine-grained DI for management node types. Don't allocate processing resources on Router (#4429)
* Remove DruidProcessingModule, QueryableModule and QueryRunnerFactoryModule from DI for coordinator, overlord, middle-manager. Add RouterDruidProcessing not to allocate processing resources on router

* Fix examples

* Fixes

* Revert Peon configs and add comments

* Remove qualifier
2017-06-27 22:58:01 -07:00
dgolitsyn e04b8be52e maxSegmentsInQueue in CoordinatorDinamicConfig (#4445)
* Add maxSegmentsInQueue parameter to CoordinatorDinamicConfig and use it in LoadRule to improve segments loading and replication time

* Rename maxSegmentsInQueue to maxSegmentsInNodeLoadingQueue

* Make CoordinatorDynamicConfig constructor private; add/fix tests; set default maxSegmentsInNodeLoadingQueue to 0 (unbounded)

* Docs added for maxSegmentsInNodeLoadingQueue parameter in CoordinatorDynamicConfig

* More docs for maxSegmentsInNodeLoadingQueue and style fixes
2017-06-27 22:58:36 -05:00
Roman Leventov 05d58689ad Remove the ability to create segments in v8 format (#4420)
* Remove ability to create segments in v8 format

* Fix IndexGeneratorJobTest

* Fix parameterized test name in IndexMergerTest

* Remove extra legacy merging stuff

* Remove legacy serializer builders

* Remove ConciseBitmapIndexMergerTest and RoaringBitmapIndexMergerTest
2017-06-26 13:21:39 -07:00
jeffhartley 3e7f7720a1 update aggregations.md re: rollup (#4455)
noted that rollup could be on or off
2017-06-23 14:28:59 -07:00
Fokko Driesprong ff501e8f13 Add Date support to the parquet reader (#4423)
* Add Date support to the parquet reader

Add support for the Date logical type. Currently this is not supported. Since the parquet
date is number of days since epoch gets interpreted as seconds since epoch, it will fails
on indexing the data because it will not map to the appriopriate bucket.

* Cleaned up code and tests

Got rid of unused json files in the examples, cleaned up the tests by
using try-with-resources. Now get the filenames from the json file
instead of hard coding them and integrated general improvements from
the feedback provided by leventov.

* Got rid of the caching

Remove the caching of the logical type of the time dimension column
and cleaned up the code a bit.
2017-06-22 15:56:08 -05:00
Jonathan Wei 3b70995bb3 Configurable row limit for JDBC frames (#4417) 2017-06-16 17:07:40 -07:00
Amar Ramachandran fc80df339e Fix incorrect name (#4386) 2017-06-09 13:32:17 -04:00
Gian Merlino 1f2afccdf8 Expressions: Add ExprMacros. (#4365)
* Expressions: Add ExprMacros, which have the same syntax as functions, but
can convert themselves to any kind of Expr at parse-time.

ExprMacroTable is an extension point for adding new ExprMacros. Anything
that might need to parse expressions needs an ExprMacroTable, which can
be injected through Guice.

* Address code review comments.
2017-06-08 09:32:10 -04:00
Jordan Pittier 6697f3a62b [Doc]Update configuration/historical.md: correct default numBootstrapThreads value (#4376)
According to 4ace65a2af/server/src/main/java/io/druid/segment/loading/SegmentLoaderConfig.java (L81) if numBootstrapThreads is not set, it default to numLoadingThreads.
2017-06-07 10:24:42 -07:00
kaijianding 551a89bd67 serialize DateTime As Long to improve json serde performance (#4038) 2017-06-06 10:08:51 -07:00
Yuya Fujiwara 152d4e89ab Fix typo in the avro.md. (#4370) 2017-06-06 07:14:08 -07:00
David Lim 13ecf90923 Report Kafka lag information in supervisor status report (#4314)
* refactor lag reporting and report lag at status endpoint

* refactor offset reporting logic to fetch offsets periodically vs. at request time

* remove JavaCompatUtils

* code review changes

* code review changes
2017-06-05 13:26:25 -07:00
Jonathan Wei b90c28e861 Support limit push down for GroupBy (#3873)
* Support limit push down for GroupBy V2

* Use orderBy spec ordering when applying limit push down

* PR Comments

* Remove unused var

* Checkstyle fixes

* Fix test

* Add comment on non-final variables, fix checkstyle

* Address PR comments

* PR comments

* Remove unnecessary buffer reset

* Fix missing @JsonProperty annotation
2017-06-02 15:39:04 -07:00
Jonathan Wei 6daddf97c5 More documentation on expected interval format for coordinator endpoints (#4361) 2017-06-02 15:21:44 -07:00
Jihoon Son 1150bf7a2c Refactoring Appenderator Driver (#4292)
* Refactoring Appenderator

1) Added publishExecutor and handoffExecutor for background publishing and handing segments off
2) Change add() to not move segments out in it

* Address comments

1) Remove publishTimeout for KafkaIndexTask
2) Simplifying registerHandoff()
3) Add increamental handoff test

* Remove unused variable

* Add persist() to Appenderator and more tests for AppenderatorDriver

* Remove unused imports

* Fix strict build

* Address comments
2017-06-02 07:09:11 +09:00
fanjieqi 2e933e1413 fix a bug in select-query.md which the property_form lack of the『granularity』 (#4327)
There result would be {"error"=>"Unknown exception",
"errorMessage"=>nil, "errorClass"=>"java.lang.NullPointerException",
"host"=>nil} when the json lack of 『granularity』.
2017-05-30 17:04:39 -07:00
Kenji Noguchi 3400f601db Protobuf extension (#4039)
* move ProtoBufInputRowParser from processing module to protobuf extensions

* Ported PR #3509

* add DynamicMessage

* fix local test stuff that slipped in

* add license header

* removed redundant type name

* removed commented code

* fix code style

* rename ProtoBuf -> Protobuf

* pom.xml: shade protobuf classes, handle .desc resource file as binary file

* clean up error messages

* pick first message type from descriptor if not specified

* fix protoMessageType null check. add test case

* move protobuf-extension from contrib to core

* document: add new configuration keys, and descriptions

* update document. add examples

* move protobuf-extension from contrib to core (2nd try)

* touch

* include protobuf extensions in the distribution

* fix whitespace

* include protobuf example in the distribution

* example: create new pb obj everytime

* document: use properly quoted json

* fix whitespace

* bump parent version to 0.10.1-SNAPSHOT

* ignore Override check

* touch
2017-05-30 13:11:58 -07:00
Kamal Gurala dcb07d6958 Option to configure default analysis types in SegmentMetadataQuery (#4259)
* Option to configure default analysis types

* Updated Docs and renamed

* Added serde tests and Null handling

* Fixed Documentation

* Updated implementation

* Updated implementation

* Updated implementation

* Added usingDefaultIntervals in Builder

* Updated implementation

* Updated implementation and added failing test

* filterSegments implementation updated

* Updated imlementation

* Padding

* Add missing Override

* Updated implementation

* Fixed a naming bug

* Fixed bug

* Removed comment
2017-05-26 12:12:39 -07:00
zwang180 2c55a935f8 Delete a duplicate "Bucket Extraction Function" section at the bottom of "Querying"-"DimensionSpec" page (#4331) 2017-05-25 14:16:00 -07:00
Jihoon Son 11b7b1bea6 Add support for HttpFirehose (#4297)
* Add support for HttpFirehose

* Fix document

* Add documents
2017-05-25 16:13:04 -05:00
李成露(StefanLee) 22977780aa Doc (#4217)
* Fixed (#4216)

Modify the default value of  `druid.server.http.numThreads`  to `Math.max(10, (Runtime.getRuntime().availableProcessors() * 17) / 16 + 2) + 30`

* Fixed(#4216)

Modify the default value of  `druid.server.http.numThreads` to `max(10, (Number of cores * 17) / 16 + 2) + 30`

* Fixed(#4216)

Modify the default value of  `druid.server.http.numThreads`  to `max(10, (Number of cores * 17) / 16 + 2) + 30`
2017-05-23 17:04:52 +09:00
Gian Merlino adeecc0e72 Add /isLeader call to overlord and coordinator. (#4282)
This is useful for putting them behind load balancers or proxies, as it lets
the load balancer know which server is currently active through an http health
check.

Also makes the method naming a little more consistent between coordinator and
overlord code.
2017-05-18 20:46:13 -05:00
Jihoon Son 733dfc9b30 Add PrefetchableTextFilesFirehoseFactory for cloud storage types (#4193)
* Add PrefetcheableTextFilesFirehoseFactory

* fix comment

* exception handling

* Fix wrong json property

* Remove ReplayableFirehoseFactory and fix misspelling

* Defer object initialization

* Add a temporaryDirectory parameter to FirehoseFactory.connect()

* fix when cache and fetch are disabled

* Address comments

* Add more test

* Increase timeout for test

* Add wrapObjectStream

* Move methods to Firehose from PrefetchableFirehoseFactory

* Cleanup comment

* add directory listing to s3 firehose

* Rename a variable

* Addressing comments

* Update document

* Support disabling prefetch

* Fix race condition

* Add fetchLock

* Remove ReplayableFirehoseFactoryTest

* Fix compilation error

* Fix test failure

* Address comments

* Add default implementation for new method
2017-05-18 15:37:18 +09:00
David Lim 8333043b7b add skipOffsetGaps flag (#4256) 2017-05-16 12:19:28 -06:00
Himanshu 136b2fae72 improve query timeout handling and limit max scatter-gather bytes (#4229)
* improve query timeout handling and limit max scatter-gather bytes

* address review comments
2017-05-16 12:47:32 -05:00
Jihoon Son 50a4ec2b0b Add support for headers and skipping thereof for CSV and TSV (#4254)
* initial commit

* small fixes

* fix bug

* fix bug

* address code review

* more cr

* more cr

* more cr

* fix

* Skip head rows for CSV and TSV

* Move checking skipHeadRows to FileIteratingFirehose

* Remove checking null iterators

* Remove unused imports

* Address comments

* Fix compilation error

* Address comments

* Add more tests

* Add a comment to ReplayableFirehose

* Addressing comments

* Add docs and fix typos
2017-05-15 22:57:31 -07:00
Himanshu 462f6482df optionally add extensions to explicitly specified hadoopContainerClassPath (#4230)
* optionally add extensions to explicitly specified hadoopContainerClassPath

* note extensions always pushed in hadoop container when druid.extensions.hadoopContainerDruidClasspath is not provided explicitly
2017-05-08 14:24:14 -05:00
Himanshu 417714d228 additional lookup status discovery http endpoints at coordinator (#4228)
* additional lookup status discovery http endpoints at coordinator

* more changes

* jsonize the error msgs as well

* fix tests
2017-05-04 11:15:30 -07:00
Parag Jain 4502c207af fix injection bug and documentation (#4243) 2017-05-03 15:07:43 -05:00
hzy001 0c464f4a84 Fix docs (#4225)
* Fix one typo

Signed-off-by: Hao Ziyu <haoziyu@qiyi.com>

* Fix deprecated links

Signed-off-by: Hao Ziyu <haoziyu@qiyi.com>
2017-05-01 09:55:43 -07:00
Jihoon Son 7411b18df9 Add BroadcastDistributionRule (#4077)
* Add BroadcastDistributionRule

* Add missing null check

* Rename variable 'colocateDataSource' to 'colocatedDatasource'

* Address comments

* Document for broadcast rules

* Drop segments which are not co-located anymore

* Remove duplicated segment loading and dropping

* Add caveat

* address comments
2017-05-01 09:55:17 -07:00
Himanshu 5a5a2749cd improvements to coordinator lookups management (#3855)
* coordinator lookups mgmt improvements

* revert replaces removal, deprecate it instead

* convert and use older specs stored in db

* more tests and updates

* review comments

* add behavior for 0.10.0 to 0.9.2 downgrade

* incorporating more review comments

* remove explicit lock and use LifecycleLock in LookupReferencesManager. use LifecycleLock in LookupCoordinatorManager as well

* wip on LookupCoordinatorManager

* lifecycle lock

* refactor thread creation into utility method

* more review comments addressed

* support smooth roll back of lookup snapshots from 0.10.0 to 0.9.2

* correctly use LifecycleLock in LookupCoordinatorManager and remove synchronization from start/stop

* run lookup mgmt on leader coordinator only

* wip: changes to do multiple start() and stop() on LookupCoordinatorManager

* lifecycleLock fix usage in LookupReferencesManagerTest

* add LifecycleLock back

* fix license hdr

* some fixes

* make LookupReferencesManager.getAllLookupsState() consistent while still being lockless

* address review comments

* addressing leventov's comments

* address charle's comments

* add IOE.java

* for safety in LookupReferencesManager mainThread check for lifecycle started state on each loop in addition to interrupt

* move thread creation utility method to Execs

* fix names

* add tests for LookupCoordinatorManager.lookupManagementLoop()

* add further tests for figuring out toBeLoaded and toBeDropped on LookupCoordinatorManager

* address leventov comments

* remove LookupsStateWithMap and parameterize LookupsState

* address review comments

* address more review comments

* misc fixes
2017-04-28 08:41:38 -05:00
Gian Merlino 631068b099 Fix broken DataSketches link. (#4221)
* Fix broken DataSketches link.

* Better fixed link.
2017-04-27 17:37:12 -07:00
Himanshu 40057570f3 doc update on overlord console url when coordinator is acting as overlord (#4213) 2017-04-26 15:03:54 -07:00
asrayousuf e4fbc2bc5b Updating the description of useCache (#4200)
Updating the description of useCache

Updating query-context doc based on Gian's comment

Updating query-context doc based on Gian's comment

Updating query-context doc based on Gian's comment

Updating query-context doc based on Gian's comment
2017-04-25 10:26:15 -07:00
satishbhor d51097c809 Fix lz4 library incompatibility in kafka-indexing-service extension (#4115)
* Fix lz4 library incompatibility in kafka-indexing-service extension #3266

* Bumped Kafka version to 0.10.2.0 for : Fix lz4 library incompatibility in kafka-indexing-service extension #3266

* Replaced Lists.newArrayList() with Collections.singletonList() For Fix lz4 library incompatibility in kafka-indexing-service extension #4115
2017-04-25 12:23:51 +09:00
Jihoon Son 5b69f2eff2 Make timeout behavior consistent to document (#4134)
* Make timeout behavior consistent to document

* Refactoring BlockingPool and add more methods to QueryContexts

* remove unused imports

* Addressed comments

* Address comments

* remove unused method

* Make default query timeout configurable

* Fix test failure

* Change timeout from period to millis
2017-04-19 09:47:53 +09:00
Gian Merlino b2954d5fea Better groupBy error messages and docs around resource limits. (#4162)
* Better groupBy error messages and docs around resource limits.

* Fix BufferGrouper test from datasketches.

* Further clarify.
2017-04-13 10:38:53 -07:00
Xiuming Chen 7e4e5510e0 Outdated property names (#4146)
Outdated property names?
2017-04-05 16:37:38 -07:00
Dongkyu Hwangbo 0d2e91ed50 Adding Kafka-emitter (#3860)
* Initial commit

* Apply another config: clustername

* Rename variable

* Fix bug

* Add retry logic

* Edit retry logic

* Upgrade kafka-clients version to the most recent release

* Make callback single object

* Write documentation

* Rewrite error message and emit logic

* Handling AlertEvent

* Override toString()

* make clusterName more optional

* bump up druid version

* add producer.config option which make user can apply another optional config value of kafka producer

* remove potential blocking in emit()

* using MemoryBoundLinkedBlockingQueue

* Fixing coding convention

* Remove logging every exception and just increment counting

* refactoring

* trivial modification

* logging when callback has exception

* Replace kafka-clients 0.10.1.1 with 0.10.2.0

* Resolve the problem related of classloader

* adopt try statement

* code reformatting

* make variables final

* rewrite toString
2017-04-04 14:07:43 -07:00
JackyWoo a0f2cf05d5 Add EqualDistributionWithAffinityWorkerSelectStrategy which balance w… (#3998)
* Add EqualDistributionWithAffinityWorkerSelectStrategy which balance work load within affinity workers.

* add docs to equalDistributionWithAffinity
2017-03-25 19:15:49 -07:00
Gian Merlino dd6c0ab509 Add SQL REGEXP_EXTRACT function; add "index" to "regex" extractionFn. (#4055)
* Add SQL REGEXP_EXTRACT function; add "index" to "regex" extractionFn.

* Fix tests.
2017-03-24 17:38:36 -07:00
Himanshu de081c711b RealtimeIndexTask to support alertTimeout in context (#4089)
* RealtimeIndexTask to support alertTimeout in context and raise alert if task process exists after the timeout

* move alertTimeout config to tuningConfig and document
2017-03-24 12:48:12 -07:00
Gian Merlino b4289c0004 Remove "granularity" from IngestSegmentFirehose. (#4110)
It wasn't doing anything useful (the sequences were being concatted, and
cursor.getTime() wasn't being called) and it defaulted to Granularities.NONE.
Changing it to Granularities.ALL gave me a 700x+ performance boost on a
small dataset I was reindexing (2m27s to 365ms). Most of that was from avoiding
making a lot of unnecessary column selectors.
2017-03-24 10:28:54 -07:00
Erik Dubbelboer 2cbc4764f8 Comparing dimensions to each other in a filter (#3928)
Comparing dimensions to each other using a select filter
2017-03-23 18:23:46 -07:00
Gian Merlino db15d494ca Update docs for query filter HavingSpecs. (#4063) 2017-03-15 13:59:09 -04:00
hzy001 c4f44c0590 Update the docs (#4059)
Signed-off-by: Hao Ziyu <haoziyu@qiyi.com>
2017-03-15 10:32:29 -04:00
Gian Merlino 3216134f8c SQL: Make row extractions extensible and add one for lookups. (#3991)
This is a reopening of #3989, since that PR was merged to master prematurely
and accidentally.
2017-03-13 21:56:16 -07:00
Gian Merlino cab2e2f5d5 Add docs about filtering and indexes on numeric columns. (#4035) 2017-03-10 12:48:59 -08:00
Gian Merlino 960769c583 SQL: Fix example INFORMATION_SCHEMA query. (#4017) 2017-03-06 16:07:47 -08:00
Gian Merlino 4ca5270e88 Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods. (#4004)
* Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods.

Includes two fixes:
- groupBy v2 now ignores chunkPeriod, since it wouldn't have helped anyway (its mergeResults
returns a lazy sequence) and it generates incorrect results.
- Fix chunkPeriod handling for periods of irregular length, like "P1M" or "P1Y".

Also includes doc and test fixes:
- groupBy v1 was no longer being tested by GroupByQueryRunnerTest since #3953, now it
  is once again.
- chunkPeriod documentation was misleading due to its checkered past. Updated it to
  be more accurate.

* Remove unused import.

* Restore buffer size.
2017-03-06 12:27:02 -06:00
kaijianding 19ac1c7c2c Add SameIntervalMergeTask for easier usage of MergeTask (#3981)
* Add SameIntervalMergeTask for easier usage of MergeTask

* fix a bug and add ut

* remove same_interval_merge_sub from Task.java and remove other no needed code
2017-03-06 11:21:32 -06:00
Gian Merlino 337f3870d8 Fix TimeFormatExtractionFn getCacheKey when tz, locale are not provided. (#4007)
* Fix TimeFormatExtractionFn getCacheKey when tz, locale are not provided.

* Remove unused import.

* Use defaults in cache key.
2017-03-04 17:41:59 -08:00
Gian Merlino af5a4cce3c SQL: Clarify approximate distinct count behavior. (#4000) 2017-03-03 13:42:30 -08:00
Himanshu e7e3c2dc5a support singleThreaded flag for groupBy-v2 as well (#3992) 2017-03-03 23:43:06 +05:30
Gian Merlino 4a56d7d8a0 SQL: Ability to generate exact distinct count queries. (#3999) 2017-03-03 23:40:36 +05:30
Gian Merlino 3e8dbd59f8 Fix groupBy docs to reflect that 'v2' is default. (#3993) 2017-03-02 15:13:39 -08:00
Gian Merlino e63eefd7ff Revert "SQL: Make row extractions extensible and add one for lookups. (#3989)"
The PR was merged to master accidentally.

This reverts commit 23927a3c96.
2017-03-01 17:06:12 -08:00
Jonathan Wei 5fb1638534 Add default configuration for select query 'fromNext' parameter (#3986)
* Add default configuration for select query 'fromNext' parameter

* PR comments

* Fix PagingSpec config injection

* Injection fix for test
2017-03-01 17:05:35 -08:00
Gian Merlino 23927a3c96 SQL: Make row extractions extensible and add one for lookups. (#3989)
* SQL: Make row extractions extensible and add one for lookups.

* Fix QuantileSqlAggregatorTest.
2017-03-01 17:03:43 -08:00
Aseem Bansal b8ba237f78 Update toc.md (#3704) 2017-03-01 14:33:39 -08:00
Fokko Driesprong add17fa7db Remove the metadataUpdateSpec from specfile (#3973)
Get rid of the metadataUpdateSpec section in the json example to
ingest parquet into druid. When this element is present, it will
fail start an indexing job.
2017-03-01 14:24:36 -08:00
Akash Dwivedi 94da5e80f9 Namespace optimization for hdfs data segments. (#3877)
* NN optimization for hdfs data segments.

* HdfsDataSegmentKiller, HdfsDataSegment finder changes to use new storage
format.Docs update.

* Common utility function in DataSegmentPusherUtil.

* new static method `makeSegmentOutputPathUptoVersionForHdfs` in JobHelper

* reuse getHdfsStorageDirUptoVersion in
DataSegmentPusherUtil.getHdfsStorageDir()

* Addressed comments.

* Review comments.

* HdfsDataSegmentKiller requested changes.

* extra newline

* Add maprfs.
2017-03-01 09:51:20 -08:00
Jonathan Wei a08660a9ca Support ingestion of long/float dimensions (#3966)
* Support ingestion for long/float dimensions

* Allow non-arrays for key components in indexing type strategy interfaces

* Add numeric index merge test, fixes

* Docs for numeric dims at ingestion

* Remove unused import

* Adjust docs, add aggregate on numeric dims tests

* remove unused imports

* Throw exception for bitmap method on numerics

* Move typed selector creation to DimensionIndexer interface

* unused imports

* Fix

* Remove unused DimensionSpec from indexer methods, check for dims first in inc index storage adapter

* Remove spaces
2017-02-28 19:04:41 -08:00
kaijianding ef6a19c81b buildV9Directly in MergeTask and AppendTask (#3976)
* buildV9Directly in MergeTask and AppendTask

* add doc
2017-02-28 10:04:32 -08:00
praveev c3bf40108d One granularity (#3850)
* Refactor Segment Granularity

* Beginning of one granularity

* Copy the fix for custom periods in segment-grunalrity over here.

* Remove the custom serialization for now.

* Compilation cleanup

* Reformat code

* Fixing unit tests

* Unify to use a single iterable

* Backward compatibility for rolling upgrade

* Minor check style. Cosmetic changes.

* Rename length and millis to duration

* CR feedback

* Minor changes.
2017-02-25 01:02:29 -06:00
Aseem Bansal 1098ba7a7f Update toc.md (#3703) 2017-02-23 09:39:06 -08:00
Jihoon Son ebd100cbb0 Set default query granularity for null value (#3965) 2017-02-22 17:38:43 -08:00
Jihoon Son 7200dce112 Atomic merge buffer acquisition for groupBys (#3939)
* Atomic merge buffer acquisition for groupBys

* documentation

* documentation

* address comments

* address comments

* fix test failure

* Addressed comments

- Add InsufficientResourcesException
- Renamed GroupByQueryBrokerResource to GroupByQueryResource

* addressed comments

* Add takeBatch() to BlockingPool
2017-02-22 14:49:37 -06:00
Gian Merlino e7d01b67b6 Move SQL configs to sql.md. (#3959)
This puts all the SQL stuff in one place. It also makes life easier by
pointing out that configs be made in either common.runtime.properties
or the broker runtime.properties.
2017-02-22 08:37:24 -08:00
Jonathan Wei bc33b68b51 Use GroupBy V2 as default (#3953)
* Use GroupBy V2 as default

* Remove unused line

* Change assert to exception propagation
2017-02-18 07:40:40 -08:00
michaelschiff e5fb0e1ff5 New property for each metric that tells the StatsDEmitter to convert metric values from range 0-1 to 0-100. This (#3936)
prevents rates and percentages expressed as Doubles (0.xx) from being rounded down to 0.
2017-02-16 13:55:56 -08:00
Gian Merlino ca6053d045 SQL: Resolve column type conflicts in favor of newer segments. (#3930)
* SQL: Resolve column type conflicts in favor of newer segments.

Helps with schema evolution from e.g. long -> float, which is supported
on the query side.

* Take columns from highest timestamp instead of max segment id.

* Fixes and docs.
2017-02-15 17:48:49 -08:00
Gian Merlino 16ef513c7d SQL: Add context and contextual functions to planner. (#3919)
* SQL: Add context and contextual functions to planner.

Added support for context parameters specified as JDBC connection properties
or a JSON object for SQL-over-JSON-over-HTTP.

Also added features that depend on context functionality:

- Added CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP functions.
- Added support for time zones other than UTC via a "timeZone" context.
- Pass down query context to Druid queries too.

Also some bug fixes:

- Fix DATE handling, it was largely done incorrectly before.
- Fix CAST(__time TO DATE) which should do a floor-to-day.
- Fix non-equality comparisons to FLOOR(__time TO X).
- Fix maxQueryCount property.

* Pass down context to nested queries too.
2017-02-15 14:09:14 -08:00
Jihoon Son a459db68b6 Fine grained buffer management for groupby (#3863)
* Fine-grained buffer management for group by queries

* Remove maxQueryCount from GroupByRules

* Fix code style

* Merge master

* Fix compilation failure

* Address comments

* Address comments

- Revert Sequence
- Add isInitialized() to Grouper
- Initialize the grouper in RowBasedGrouperHelper.Accumulator
- Simple refactoring RowBasedGrouperHelper.Accumulator
- Add tests for checking the number of used merge buffers
- Improve docs

* Revert unnecessary changes

* change to visible to testing

* fix misspelling
2017-02-14 12:55:54 -08:00
Gian Merlino 78b0d134ae Require Java 8 and include some Java 8 dependencies. (#3914)
* Require Java 8 and include some Java 8 dependencies.

- Upgrade Jetty to 9.3.16.v20170120.
- Upgrade DataSketches to 0.8.4.
- Bundle caffeine-cache by default.
- Still target Java 7 when compiling base Druid classes.

* Update cluster, quickstart docs.

* Remove oraclejdk7 from travis.yml.
2017-02-14 12:51:51 -08:00
DaimonPl a2875a4d91 pre-computed HLL support for hyperUnique aggregator (#3909) 2017-02-13 15:26:20 -08:00
Himanshu 9dfcf0763a disable javascript execution by default (#3818) 2017-02-13 15:11:18 -08:00
Himanshu 8cf7ad1e3a druid.coordinator.asOverlord.enabled flag at coordinator to make it an overlord too (#3711) 2017-02-13 15:03:59 -08:00
Jonathan Wei ca2b04f0fd Add long/float ColumnSelectorStrategy implementations (#3838)
* Add long/float ColumnSelectorStrategy implementations

* Address PR comments

* Add String strategy with internal dictionary to V2 groupby, remove dict from numeric wrapping selectors, more tests

* PR comments

* Use BaseSingleValueDimensionSelector for long/float wrapping

* remove unused import

* Address PR comments

* PR comments

* PR comments

* More PR comments

* Fix failing calcite histogram subquery tests

* ScanQuery test and comment about isInputRaw

* Add outputType to extractionDimensionSpec, tweak SQL tests

* Fix limit spec optimization for numerics

* Add cardinality sanity checks to TopN

* Fix import from merge

* Add tests for filtered dimension spec outputType

* Address PR comments

* Allow filtered dimspecs on numerics

* More comments
2017-02-08 20:39:29 -08:00
Erik Dubbelboer 2aa2fa57b5 Simple doc fix (#3907) 2017-02-06 15:52:17 +05:30