Commit Graph

300 Commits

Author SHA1 Message Date
Atul Mohan c07678b143 Synchronization of lookups during startup of druid processes (#4758)
* Changes for lookup synchronization

* Refactor of Lookup classes

* Minor refactors and doc update

* Change coordinator instance to be retrieved by DruidLeaderClient

* Wait before thread shutdown

* Make disablelookups flag true by default

* Update docs

* Rename flag

* Move executorservice shutdown to finally block

* Update LookupConfig

* Refactoring and doc changes

* Remove lookup config constructor

* Revert Lookupconfig constructor changes

* Add tests to LookupConfig

* Make executorservice local

* Update LRM

* Move ListeningScheduledExecutorService to ExecutorCompletionService

* Move exception to outer block

* Remove check to see future is done

* Remove unnecessary assignment

* Add logging
2017-10-12 21:22:24 -05:00
Gian Merlino b20e3038b6 SQL: Upgrade to Calcite 1.14.0, some refactoring of internals. (#4889)
* SQL: Upgrade to Calcite 1.14.0, some refactoring of internals.

This brings benefits:
- Ability to do GROUP BY and ORDER BY with ordinals.
- Ability to support IN filters beyond 19 elements (fixes #4203).

Some refactoring of druid-sql internals:
- Builtin aggregators and operators are implemented as SqlAggregators
  and SqlOperatorConversions rather being special cases. This simplifies
  the Expressions and GroupByRules code, which were becoming complex.
- SqlAggregator implementations are no longer responsible for filtering.

Added new functions:
- Expressions: strpos.
- SQL: TRUNCATE, TRUNC, LENGTH, CHAR_LENGTH, STRLEN, STRPOS, SUBSTR,
  and DATE_TRUNC.

* Add missing @Override annotation.

* Adjustments for forbidden APIs.

* Adjustments for forbidden APIs.

* Disable GROUP BY alias.

* Doc reword.
2017-10-10 12:44:05 -07:00
Gian Merlino 4e1d0f49d8 Docs: Fix link to broker configuration. (#4934) 2017-10-10 11:18:46 -07:00
Gian Merlino 2ce8123bdb Move scan-query from a contrib extension into core. (#4751)
* Move scan-query from a contrib extension into core.

Based on a proposal at: https://groups.google.com/d/topic/druid-development/ME_OatUDnbk/discussion

This patch also adds support for virtual columns to the Scan query,
and updates Druid SQL to use Scan instead of Select.

This patch also makes some behavioral changes to handling of the __time
column. In particular, it is now is returned as "__time" rather than
"timestamp"; it is no longer included if you do not specifically ask for
it in your "columns"; and it is returned as a long rather than a string.

Users can revert time handling to the legacy extension behavior by
setting "legacy" : true in their queries, or setting the property
druid.query.scan.legacy = true. This is meant to provide a migration
path for users that were formerly using the contrib extension.

* Adjustments from review.

* Add back Select query.

* Adjust SQL docs.

* Restore SelectQuery link.
2017-09-13 09:51:24 -07:00
Gian Merlino 4909c48b0c SQL: Full TRIM support. (#4750)
* SQL: Full TRIM support.

- Support trimming arbitrary characters
- Support BOTH, LEADING, and TRAILING

* Remove unused import.

* Fix tests, add RTRIM / LTRIM.

* Remove unused imports.

* BTRIM and docs.

* Replace for with foreach.
2017-09-12 11:49:08 -07:00
Gian Merlino 9078925cab Docs for finalizingFieldAccess post-aggregator. (#4737) 2017-08-31 11:45:49 -07:00
Gian Merlino daf3c5f927 Add "round" option to cardinality and hyperUnique aggregators. (#4720)
* Add "round" option to cardinality and hyperUnique aggregators.

Also turn it on by default in SQL, to make math on distinct counts
work more as expected.

* Fix some compile errors.

* Fix test.

* Formatting.
2017-08-28 14:52:11 -07:00
Jihoon Son f3f2cd35e1 Array-based aggregation for groupBy query (#4576)
* Array-based aggregation

* Fix handling missing grouping key

* Handle invalid offset

* Fix compilation

* Add cardinality check

* Fix cardinality check

* Address comments

* Address comments

* Address comments

* Address comments

* Cleanup GroupByQueryEngineV2.process

* Change to Byte.SIZE

* Add flatMap
2017-08-03 20:04:54 +03:00
Gian Merlino d4ef0f6d94 Improved SQL support for floats and doubles. (#4598)
* Improved SQL support for floats and doubles.

- Use Druid FLOAT for SQL FLOAT, and Druid DOUBLE for SQL DOUBLE, REAL,
  and DECIMAL.
- Use float* aggregators when appropriate.
- Add tests involving both float and double columns.
- Adjust documentation accordingly.

* CR comments.

* Fix braces.
2017-07-25 13:54:44 -07:00
Slim 71e7a4c054 Adding double colums supports (#4491)
* add double columns support

* Fix numbers and expected results in UTs

* adding float aggregators

* fix IT expected test results

* fix comments

* more fixes

* fix comp

* fix test

* refactor double and float aggregator factories

* fix

* fix UTs

* fix comments

* clean unused code

* fix more comments

* undo unnecessary changes

* fix null issue

* refactor TopNColumnSelectorStrategyFactory

* fix docs

* refactor NumericTopNColumnSelectorStrategy

* fix return

* fix comments

* handle the null case in DimesionIndexer

* more null fixing

* cosmetic changes
2017-07-20 10:14:14 +03:00
Gian Merlino 16817e408d SQL + Expressions = Best friends forever. (#4360)
* SQL + Expressions = Best friends forever.

- Use expressions as a projection layer for anything that can't be
  expressed using traditional Druid extractionFns. Sometimes they're
  embedded directly (like "expression" filters, builtin aggregators,
  or "expression" post-aggregators). Sometimes they're referenced
  through virtual columns (like dimensionSpecs, which can't innately
  reference functions of more than one column without the virtual
  column layer).
- Add many new functions and operators, taking advantage of the
  expression capability (see the querying/sql.md doc).
- Improve consistency of constant reduction and of casting by
  using Druid expressions for this instead of Calcite's RexExecutor.

* Fix casting bug, and other code review comments.

* Fix docs.
2017-07-07 08:48:26 -07:00
jeffhartley 3e7f7720a1 update aggregations.md re: rollup (#4455)
noted that rollup could be on or off
2017-06-23 14:28:59 -07:00
Jonathan Wei 3b70995bb3 Configurable row limit for JDBC frames (#4417) 2017-06-16 17:07:40 -07:00
Amar Ramachandran fc80df339e Fix incorrect name (#4386) 2017-06-09 13:32:17 -04:00
kaijianding 551a89bd67 serialize DateTime As Long to improve json serde performance (#4038) 2017-06-06 10:08:51 -07:00
Jonathan Wei b90c28e861 Support limit push down for GroupBy (#3873)
* Support limit push down for GroupBy V2

* Use orderBy spec ordering when applying limit push down

* PR Comments

* Remove unused var

* Checkstyle fixes

* Fix test

* Add comment on non-final variables, fix checkstyle

* Address PR comments

* PR comments

* Remove unnecessary buffer reset

* Fix missing @JsonProperty annotation
2017-06-02 15:39:04 -07:00
fanjieqi 2e933e1413 fix a bug in select-query.md which the property_form lack of the『granularity』 (#4327)
There result would be {"error"=>"Unknown exception",
"errorMessage"=>nil, "errorClass"=>"java.lang.NullPointerException",
"host"=>nil} when the json lack of 『granularity』.
2017-05-30 17:04:39 -07:00
Kamal Gurala dcb07d6958 Option to configure default analysis types in SegmentMetadataQuery (#4259)
* Option to configure default analysis types

* Updated Docs and renamed

* Added serde tests and Null handling

* Fixed Documentation

* Updated implementation

* Updated implementation

* Updated implementation

* Added usingDefaultIntervals in Builder

* Updated implementation

* Updated implementation and added failing test

* filterSegments implementation updated

* Updated imlementation

* Padding

* Add missing Override

* Updated implementation

* Fixed a naming bug

* Fixed bug

* Removed comment
2017-05-26 12:12:39 -07:00
zwang180 2c55a935f8 Delete a duplicate "Bucket Extraction Function" section at the bottom of "Querying"-"DimensionSpec" page (#4331) 2017-05-25 14:16:00 -07:00
Himanshu 136b2fae72 improve query timeout handling and limit max scatter-gather bytes (#4229)
* improve query timeout handling and limit max scatter-gather bytes

* address review comments
2017-05-16 12:47:32 -05:00
Himanshu 417714d228 additional lookup status discovery http endpoints at coordinator (#4228)
* additional lookup status discovery http endpoints at coordinator

* more changes

* jsonize the error msgs as well

* fix tests
2017-05-04 11:15:30 -07:00
Himanshu 5a5a2749cd improvements to coordinator lookups management (#3855)
* coordinator lookups mgmt improvements

* revert replaces removal, deprecate it instead

* convert and use older specs stored in db

* more tests and updates

* review comments

* add behavior for 0.10.0 to 0.9.2 downgrade

* incorporating more review comments

* remove explicit lock and use LifecycleLock in LookupReferencesManager. use LifecycleLock in LookupCoordinatorManager as well

* wip on LookupCoordinatorManager

* lifecycle lock

* refactor thread creation into utility method

* more review comments addressed

* support smooth roll back of lookup snapshots from 0.10.0 to 0.9.2

* correctly use LifecycleLock in LookupCoordinatorManager and remove synchronization from start/stop

* run lookup mgmt on leader coordinator only

* wip: changes to do multiple start() and stop() on LookupCoordinatorManager

* lifecycleLock fix usage in LookupReferencesManagerTest

* add LifecycleLock back

* fix license hdr

* some fixes

* make LookupReferencesManager.getAllLookupsState() consistent while still being lockless

* address review comments

* addressing leventov's comments

* address charle's comments

* add IOE.java

* for safety in LookupReferencesManager mainThread check for lifecycle started state on each loop in addition to interrupt

* move thread creation utility method to Execs

* fix names

* add tests for LookupCoordinatorManager.lookupManagementLoop()

* add further tests for figuring out toBeLoaded and toBeDropped on LookupCoordinatorManager

* address leventov comments

* remove LookupsStateWithMap and parameterize LookupsState

* address review comments

* address more review comments

* misc fixes
2017-04-28 08:41:38 -05:00
asrayousuf e4fbc2bc5b Updating the description of useCache (#4200)
Updating the description of useCache

Updating query-context doc based on Gian's comment

Updating query-context doc based on Gian's comment

Updating query-context doc based on Gian's comment

Updating query-context doc based on Gian's comment
2017-04-25 10:26:15 -07:00
Jihoon Son 5b69f2eff2 Make timeout behavior consistent to document (#4134)
* Make timeout behavior consistent to document

* Refactoring BlockingPool and add more methods to QueryContexts

* remove unused imports

* Addressed comments

* Address comments

* remove unused method

* Make default query timeout configurable

* Fix test failure

* Change timeout from period to millis
2017-04-19 09:47:53 +09:00
Gian Merlino b2954d5fea Better groupBy error messages and docs around resource limits. (#4162)
* Better groupBy error messages and docs around resource limits.

* Fix BufferGrouper test from datasketches.

* Further clarify.
2017-04-13 10:38:53 -07:00
Gian Merlino dd6c0ab509 Add SQL REGEXP_EXTRACT function; add "index" to "regex" extractionFn. (#4055)
* Add SQL REGEXP_EXTRACT function; add "index" to "regex" extractionFn.

* Fix tests.
2017-03-24 17:38:36 -07:00
Erik Dubbelboer 2cbc4764f8 Comparing dimensions to each other in a filter (#3928)
Comparing dimensions to each other using a select filter
2017-03-23 18:23:46 -07:00
Gian Merlino db15d494ca Update docs for query filter HavingSpecs. (#4063) 2017-03-15 13:59:09 -04:00
Gian Merlino 3216134f8c SQL: Make row extractions extensible and add one for lookups. (#3991)
This is a reopening of #3989, since that PR was merged to master prematurely
and accidentally.
2017-03-13 21:56:16 -07:00
Gian Merlino cab2e2f5d5 Add docs about filtering and indexes on numeric columns. (#4035) 2017-03-10 12:48:59 -08:00
Gian Merlino 960769c583 SQL: Fix example INFORMATION_SCHEMA query. (#4017) 2017-03-06 16:07:47 -08:00
Gian Merlino 4ca5270e88 Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods. (#4004)
* Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods.

Includes two fixes:
- groupBy v2 now ignores chunkPeriod, since it wouldn't have helped anyway (its mergeResults
returns a lazy sequence) and it generates incorrect results.
- Fix chunkPeriod handling for periods of irregular length, like "P1M" or "P1Y".

Also includes doc and test fixes:
- groupBy v1 was no longer being tested by GroupByQueryRunnerTest since #3953, now it
  is once again.
- chunkPeriod documentation was misleading due to its checkered past. Updated it to
  be more accurate.

* Remove unused import.

* Restore buffer size.
2017-03-06 12:27:02 -06:00
Gian Merlino 337f3870d8 Fix TimeFormatExtractionFn getCacheKey when tz, locale are not provided. (#4007)
* Fix TimeFormatExtractionFn getCacheKey when tz, locale are not provided.

* Remove unused import.

* Use defaults in cache key.
2017-03-04 17:41:59 -08:00
Gian Merlino af5a4cce3c SQL: Clarify approximate distinct count behavior. (#4000) 2017-03-03 13:42:30 -08:00
Himanshu e7e3c2dc5a support singleThreaded flag for groupBy-v2 as well (#3992) 2017-03-03 23:43:06 +05:30
Gian Merlino 4a56d7d8a0 SQL: Ability to generate exact distinct count queries. (#3999) 2017-03-03 23:40:36 +05:30
Gian Merlino 3e8dbd59f8 Fix groupBy docs to reflect that 'v2' is default. (#3993) 2017-03-02 15:13:39 -08:00
Gian Merlino e63eefd7ff Revert "SQL: Make row extractions extensible and add one for lookups. (#3989)"
The PR was merged to master accidentally.

This reverts commit 23927a3c96.
2017-03-01 17:06:12 -08:00
Jonathan Wei 5fb1638534 Add default configuration for select query 'fromNext' parameter (#3986)
* Add default configuration for select query 'fromNext' parameter

* PR comments

* Fix PagingSpec config injection

* Injection fix for test
2017-03-01 17:05:35 -08:00
Gian Merlino 23927a3c96 SQL: Make row extractions extensible and add one for lookups. (#3989)
* SQL: Make row extractions extensible and add one for lookups.

* Fix QuantileSqlAggregatorTest.
2017-03-01 17:03:43 -08:00
Jihoon Son 7200dce112 Atomic merge buffer acquisition for groupBys (#3939)
* Atomic merge buffer acquisition for groupBys

* documentation

* documentation

* address comments

* address comments

* fix test failure

* Addressed comments

- Add InsufficientResourcesException
- Renamed GroupByQueryBrokerResource to GroupByQueryResource

* addressed comments

* Add takeBatch() to BlockingPool
2017-02-22 14:49:37 -06:00
Gian Merlino e7d01b67b6 Move SQL configs to sql.md. (#3959)
This puts all the SQL stuff in one place. It also makes life easier by
pointing out that configs be made in either common.runtime.properties
or the broker runtime.properties.
2017-02-22 08:37:24 -08:00
Jonathan Wei bc33b68b51 Use GroupBy V2 as default (#3953)
* Use GroupBy V2 as default

* Remove unused line

* Change assert to exception propagation
2017-02-18 07:40:40 -08:00
Gian Merlino ca6053d045 SQL: Resolve column type conflicts in favor of newer segments. (#3930)
* SQL: Resolve column type conflicts in favor of newer segments.

Helps with schema evolution from e.g. long -> float, which is supported
on the query side.

* Take columns from highest timestamp instead of max segment id.

* Fixes and docs.
2017-02-15 17:48:49 -08:00
Gian Merlino 16ef513c7d SQL: Add context and contextual functions to planner. (#3919)
* SQL: Add context and contextual functions to planner.

Added support for context parameters specified as JDBC connection properties
or a JSON object for SQL-over-JSON-over-HTTP.

Also added features that depend on context functionality:

- Added CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP functions.
- Added support for time zones other than UTC via a "timeZone" context.
- Pass down query context to Druid queries too.

Also some bug fixes:

- Fix DATE handling, it was largely done incorrectly before.
- Fix CAST(__time TO DATE) which should do a floor-to-day.
- Fix non-equality comparisons to FLOOR(__time TO X).
- Fix maxQueryCount property.

* Pass down context to nested queries too.
2017-02-15 14:09:14 -08:00
Jihoon Son a459db68b6 Fine grained buffer management for groupby (#3863)
* Fine-grained buffer management for group by queries

* Remove maxQueryCount from GroupByRules

* Fix code style

* Merge master

* Fix compilation failure

* Address comments

* Address comments

- Revert Sequence
- Add isInitialized() to Grouper
- Initialize the grouper in RowBasedGrouperHelper.Accumulator
- Simple refactoring RowBasedGrouperHelper.Accumulator
- Add tests for checking the number of used merge buffers
- Improve docs

* Revert unnecessary changes

* change to visible to testing

* fix misspelling
2017-02-14 12:55:54 -08:00
DaimonPl a2875a4d91 pre-computed HLL support for hyperUnique aggregator (#3909) 2017-02-13 15:26:20 -08:00
Himanshu 9dfcf0763a disable javascript execution by default (#3818) 2017-02-13 15:11:18 -08:00
Jonathan Wei ca2b04f0fd Add long/float ColumnSelectorStrategy implementations (#3838)
* Add long/float ColumnSelectorStrategy implementations

* Address PR comments

* Add String strategy with internal dictionary to V2 groupby, remove dict from numeric wrapping selectors, more tests

* PR comments

* Use BaseSingleValueDimensionSelector for long/float wrapping

* remove unused import

* Address PR comments

* PR comments

* PR comments

* More PR comments

* Fix failing calcite histogram subquery tests

* ScanQuery test and comment about isInputRaw

* Add outputType to extractionDimensionSpec, tweak SQL tests

* Fix limit spec optimization for numerics

* Add cardinality sanity checks to TopN

* Fix import from merge

* Add tests for filtered dimension spec outputType

* Address PR comments

* Allow filtered dimspecs on numerics

* More comments
2017-02-08 20:39:29 -08:00
Gian Merlino ac84a3e011 SQL: Add resolution parameter, fix filtering bug with APPROX_QUANTILE (#3868)
* SQL: Add resolution parameter to quantile agg, rename to APPROX_QUANTILE.

* Fix bug with re-use of filtered approximate histogram aggregators.

Also add APPROX_QUANTILE tests for filtering and running on complex columns.
Includes some slight refactoring to allow tests to make DruidTables that
include complex columns.

* Remove unused import
2017-01-25 18:39:26 -08:00
Gian Merlino d51f5e058d SQL: Ditch CalciteConnection layer and add DruidMeta, extension aggregators. (#3852)
* SQL: Ditch CalciteConnection layer and add DruidMeta, extension aggregators.

Switched from CalciteConnection to Planner, bringing benefits:

- CalciteConnection's JDBC interface no longer sits between the SQL server
  (HTTP/Avatica) and Druid's query layer. Instead, the SQL servers can use
  Druid Sequence objects directly, reducing overhead in the query return path.

- Implemented our own Planner-based Avatica Meta, letting us control
  connection timeouts and connection / statement limits. The previous
  CalciteConnection-based implementation didn't have any limits or timeouts.

- The Planner interface lets us override the operator table, opening up
  SQL language extensions. This patch includes two: APPROX_COUNT_DISTINCT
  in core, and a QUANTILE aggregator in the druid-histogram extension.

Also:

- Added INFORMATION_SCHEMA metadata schema.

- Added tests for Unicode literals and escapes.

* Verify statement is actually open before closing it.

* More detailed INFORMATION_SCHEMA docs.
2017-01-19 16:32:20 -08:00
Gian Merlino e86859b228 SQL support for nested groupBys. (#3806)
* SQL support for nested groupBys.

Allows, for example, doing exact count distinct by writing:

  SELECT COUNT(*) FROM (SELECT DISTINCT col FROM druid.foo)

Contrast with approximate count distinct, which is:

  SELECT COUNT(DISTINCT col) FROM druid.foo

* Add deeply-nested groupBy docs, tests, and maxQueryCount config.

* Extract magic constants into statics.

* Rework rules to put preconditions in the "matches" method.
2017-01-11 18:32:53 -08:00
Jihoon Son c099977a5b Add an option to SearchQuery to choose a search query execution strategy (#3792)
* Add an option to SearchQuery to choose a search query execution strategy.

Supported strategies are
1) Index-only query execution
2) Cursor-based scan
3) Auto: choose an efficient strategy for a given query

* Add SearchStrategy and SearchQueryExecutor

* Address comments

* Rename strategies and set UseIndexesStrategy as the default strategy

* Add a cost-based planner for auto strategy

* Add document

* Fix code style

* apply code style

* apply comments
2017-01-10 18:04:20 -08:00
Gian Merlino dd63f54325 Built-in SQL. (#3682) 2016-12-16 17:15:59 -08:00
Jonathan Wei 2bfcc8a592 First and Last Aggregator (#3566)
* add first and last aggregator

* add test and fix

* moving around

* separate aggregator valueType

* address PR comment

* add finalize inner query and adjust v1 inner indexing

* better test and fixes

* java-util import fixes

* PR comments

* Add first/last aggs to ITWikipediaQueryTest
2016-12-16 15:26:40 -08:00
Himanshu ed322a4beb remove size from default analysisTypes list for segmentMetadata query (#3773) 2016-12-13 18:01:21 -08:00
Erik Dubbelboer bb9e35e1af Add Greatest and Least post aggregations (#3567) 2016-12-07 17:58:23 -08:00
Himanshu 45da7e48f1 groupBy sort results by (dimensions,timestamp) instead of (timestamp,dimension) (#3672)
* sortByDimsFirst flag for groupBy query

* Remove need for KeyType in Grouper<KeyType> to be Comparable<KeyType>

* fix review comments

* fix review comments regarding removing code duplication of dim/time comparison

* move comparator for KeyType object to KeySerdeFactory so that creation of comparator does not need KeySerde

* remove unnecessary system.out.println

* make access static var NATURAL_NULLS_FIRST directly

* further review comments addressing
2016-12-06 09:48:56 -08:00
Gian Merlino 353fee79dd Add "asMillis" option to "timeFormat" extractionFn. (#3733)
This is useful for chaining extractionFns that all want to treat time as millis,
such as having a javascript extractionFn after a timeFormat.
2016-12-02 13:45:16 -08:00
Gian Merlino 102375d9bb Add "strlen" extractionFn. (#3731) 2016-12-02 12:08:51 -08:00
Gian Merlino 4c5d10f8a3 Add DimFilterHavingSpec. (#3727)
* Add DimFilterHavingSpec.

* Add test for DimFilterHavingSpec with extractionFns.
2016-12-02 10:04:30 -08:00
Gian Merlino e4465e63bd Fix ordering of sections on dimensionspecs.md. (#3722)
The Filtered and List DimensionSpecs were mixed in with the extraction functions.
2016-11-29 16:28:36 -08:00
Joan Viladrosa 2df98bcaa6 Fixed Missing commas in json example of Lookup (#3680) 2016-11-15 14:56:18 +09:00
Gian Merlino 2c504b6258 Add "like" filter. (#3642)
* Add "like" filter.

* Addressed some PR comments.

* Slight simplifications to LikeFilter.

* Additional simplifications.

* Fix comment in LikeFilter.

* Clarify comment in LikeFilter.

* Simplify LikeMatcher a bit.

* No use going through the optimized path if prefix is empty.

* Add more tests.
2016-11-04 23:25:03 +05:30
Gian Merlino f8d71fc602 groupBy: Fix maxMergingDictionarySize config. (#3488) 2016-09-22 10:02:33 -07:00
Gian Merlino e0e28866ee JavaScript docs: Fix links and typos, add to TOC. (#3457) 2016-09-13 15:26:44 -07:00
Gian Merlino 76a24054e3 JavaScript docs, including docs for globals. (#3454) 2016-09-13 13:46:55 -07:00
Gian Merlino 1344e3c3af Clearer filter docs. (#3448) 2016-09-09 13:47:13 -07:00
Gian Merlino 1e3f94237e groupBy v2: Configurable load factor. (#3437)
Also change defaults:

- bufferGrouperMaxLoadFactor from 0.75 to 0.7.
- maxMergingDictionarySize to 100MB from 25MB, should be more appropriate
  for most heaps.
2016-09-07 14:14:59 -05:00
Gian Merlino e9050c2b4c TimeFormatExtractionFn: Allow null formats (equivalent to ISO8601) and granular bucketing. (#3411) 2016-08-31 20:58:53 +05:30
Jonathan Wei 4e91330a17 Use DimensionSpec in CardinalityAggregatorFactory (#3406)
* Use DimensionSpec in CardinalityAggregatorFactory

* Address PR comments

* Fix requiredFields()
2016-08-30 15:54:02 -07:00
rajk-tetration 362b9266f8 Adding filters for TimeBoundary on backend (#3168)
* Adding filters for TimeBoundary on backend

Signed-off-by: Balachandar Kesavan <raj.ksvn@gmail.com>

* updating TimeBoundaryQuery constructor in QueryHostFinderTest

* add filter helpers

* update filterSegments + test

* Conditional filterSegment depending on whether a filter exists

* Style changes

* Trigger rebuild

* Adding documentation for timeboundaryquery filtering

* added filter serialization to timeboundaryquery cache

* code style changes
2016-08-15 10:25:24 -07:00
Gian Merlino 8899affe48 Introduce standardized "Resource limit exceeded" error. (#3338)
Fixes #3336.
2016-08-09 10:50:56 -07:00
Gian Merlino 21bce96c4c More useful query errors. (#3335)
Follow-up to #1773, which meant to add more useful query errors but
did not actually do so. Since that patch, any error other than
interrupt/cancel/timeout was reported as `{"error":"Unknown exception"}`.

With this patch, the error fields are:

- error, one of the specific strings "Query interrupted", "Query timeout",
  "Query cancelled", or "Unknown exception" (same behavior as before).
- errorMessage, the message of the topmost non-QueryInterruptedException
  in the causality chain.
- errorClass, the class of the topmost non-QueryInterruptedException
  in the causality chain.
- host, the host that failed the query.
2016-08-09 16:14:52 +08:00
Jonathan Wei decefb7477 Add time interval dim filter and retention analysis example (#3315)
* Add time interval dim filter and retention analysis example

* Use closed-open matching for intervals, update cache key generation

* Fix time filtering tests for interval boundary change
2016-08-05 07:25:04 -07:00
kaijianding 50d52a24fc ability to not rollup at index time, make pre aggregation an option (#3020)
* ability to not rollup at index time, make pre aggregation an option

* rename getRowIndexForRollup to getPriorIndex

* fix doc misspelling

* test query using no-rollup indexes

* fix benchmark fail due to jmh bug
2016-08-02 11:13:05 -07:00
Dave Li bc20658239 groupBy nested query using v2 strategy (#3269)
* changed v2 nested query strategy

* add test for #3239

* update for new ValueMatcher interface and add benchmarks

* enable time filtering

* address PR comments

* add failing test for outer filter aggregator

* add helper class for sharing code

* update nested groupby doc

* move temporary storage instantiation

* address PR comment

* address PR comment 2
2016-08-01 18:30:39 -07:00
Jonathan Wei a6105cbb86 Add numeric StringComparator (#3270)
* Add numeric StringComparator

* Only use direct long comparison for numeric ordering in BoundFilter, add time filtering benchmark query

* Address PR comments, add multithreaded BoundDimFilter test

* Add comment on strlen tie handling

* Add timeseries interval filter benchmark

* Adjust docs

* Use jackson for StringComparator, address PR comments

* Add new TopNMetricSpec and SearchSortSpec with tests (WIP)

* More TopNMetricSpec and SearchSortSpec tests

* Fix NewSearchSortSpec serde

* Update docs for new DimensionTopNMetricSpec

* Delete NumericDimensionTopNMetricSpec

* Delete old SearchSortSpec

* Rename NewSearchSortSpec to SearchSortSpec

* Add TopN numeric comparator benchmark, address PR comments

* Refactor OrderByColumnSpec

* Add null checks to NumericComparator and String->BigDecimal conversion function

* Add more OrderByColumnSpec serde tests
2016-07-29 15:44:16 -07:00
Gian Merlino 2553997200 Associate groupBy v2 resources with the Sequence lifecycle. (#3296)
This fixes a potential issue where groupBy resources could be allocated to
create a Sequence, but then the Sequence is never used, and thus the resources
are never freed.

Also simplifies how groupBy handles config overrides (this made the new
unit test easier to write).
2016-07-27 18:44:19 -07:00
kaijianding 3dc2974894 Add timestampSpec to metadata.drd and SegmentMetadataQuery (#3227)
* save TimestampSpec in metadata.drd

* add timestampSpec info in SegmentMetadataQuery
2016-07-25 15:45:30 -07:00
Jonathan Wei a42ccb6d19 Support filtering on long columns (including __time) (#3180)
* Support filtering on __time column

* Rename DruidPredicate

* Add docs for ValueMatcherFactory, add comment on getColumnCapabilities

* Combine ValueMatcherFactory predicate methods to accept DruidCompositePredicate

* Address PR comments (support filter on all long columns)

* Use predicate factory instead of composite predicate

* Address PR comments

* Lazily initialize long handling in selector/in filter

* Move long value parsing from InFilter to InDimFilter, make long value parsing thread-safe

* Add multithreaded selector/in filter test

* Fix non-final lock object in SelectorDimFilter
2016-07-20 17:08:49 -07:00
Gian Merlino 3ab4a4efbc Fix formatting in granularities doc. (#3229) 2016-07-08 09:29:58 -07:00
Gian Merlino fdc7e88a7d Allow queries with no aggregators. (#3216)
This is actually reasonable for a groupBy or lexicographic topNs that is
being used to do a "COUNT DISTINCT" kind of query. No aggregators are
needed for that query, and including a dummy aggregator wastes 8 bytes
per row.

It's kind of silly for timeseries, but why not.
2016-07-06 20:38:54 +05:30
jaehong choi efbcbf5315 Support alphanumeric sort in search query (#2593)
* support alphanumeric sort in search query

* address a comment about handling equals() and hashCode()

* address comments

* add Ut for string comparators

* address a comment about space indentations.
2016-06-28 15:06:18 -07:00
Gian Merlino 4cc39b2ee7 Alternative groupBy strategy. (#2998)
This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.

Both of these are described in more detail in #2987.

There are two goals of this patch:

1. Make it possible for historical/realtime nodes to return larger groupBy
   result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
   columns, avoiding materialization.

This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
2016-06-24 18:06:09 -07:00
Dave Li 12be1c0a4b Add bucket extraction function (#3033)
* add bucket extraction function

* add doc and header

* updated doc and test
2016-06-17 09:24:27 -07:00
linbo.jin 8c76fe7b97 docs: change OR to AND inside query docs about multi-value dims (#3162)
* docs: replace OR by AND inside topnquery docs about multi value dimensions

* docs: replace OR by AND inside groupby docs about multi value dimensions
2016-06-17 08:54:18 -07:00
Kirill Kozlov 9f93be448e Fix logical operator in example (#3093) 2016-06-07 10:44:18 -07:00
Gian Merlino 99ee3f4dc3 Fixups, clarifications to lookup docs. (#3060) 2016-06-07 10:43:35 -07:00
Charles Allen fa41a6466a Cleanup the base lookup cluster wide config docs (#3061)
* Cleanup the base lookup cluster wide config docs

* Add better examples in lookups-cached-global.md

* Use actual valid stock lookups

* Fixed maps with :

* Add mix of lookups

* Better examples in extension

* Remove unneeded namespace requirement

* Add extra line space

* Add link to lookup tiers

* Renamed header
2016-06-07 10:42:41 -07:00
Charles Allen 8cac710546 Async lookups-cached-global by default (#3074)
* Async lookups-cached-global by default
* Also better lookup docs

* Fix test timeouts

* Fix timing of deserialized test

* Fix problem with 0 wait failing immediately
2016-06-03 15:58:10 -05:00
Gian Merlino 603fbbcc20 Fix docs for "contains" search spec. (#3066) 2016-06-02 19:03:40 -07:00
Vadim Ogievetsky 13c267bfee Added new line for site formatting (#3059) 2016-06-02 11:36:45 -07:00
Vadim Ogievetsky 767190d5db Clear up confusing wording (#3052)
There is no such thing as a "Java aggregator" in Druid from a user's point of view, there are just native aggregator that happen to be implemented in Java.
2016-06-01 15:41:50 -07:00
Charles Allen eaaad01de7 [QTL] Datasource as lookupTier (#2955)
* Datasource as lookup tier
* Adds an option to let indexing service tasks pull their lookup tier from the datasource they are working for.

* Fix bad docs for lookups lookupTier

* Add Datasource name holder

* Move task and datasource to be pulled from Task file

* Make LookupModule pull from bound dataSource

* Fix test

* Fix code style on imports

* Fix formatting

* Make naming better

* Address code comments about naming
2016-05-17 15:44:42 -07:00
Parag Jain e3ea842cd3 add available query granularity strings (#2960) 2016-05-12 18:49:31 -07:00
Himanshu 8e2742b7e8 adding QueryGranularity to segment metadata and optionally expose same from segmentMetadata query (#2873) 2016-05-03 11:31:10 -07:00
Slim 55785267e4 postAgg filedName must match name of AGG (#2874) 2016-04-22 11:11:54 -07:00
Himanshu 3cfd9c64c9 make singleThreaded groupBy query config overridable at query time (#2828)
* make isSingleThreaded groupBy query processing overridable at query time

* refactor code in GroupByMergedQueryRunner to make processing of single threaded and parallel merging of runners consistent
2016-04-21 17:12:58 -07:00
Slim 984a518c9f Merge pull request #2734 from b-slim/LookupIntrospection2
[QTL][Lookup] adding introspection endpoint
2016-04-21 12:15:57 -05:00
Gian Merlino e320d13385 Fix various broken links in the docs. (#2833) 2016-04-13 13:30:01 -07:00
Charles Allen 2b99f717e4 Move lookup config doc to proper location 2016-04-08 08:15:38 -07:00
Charles Allen f915a59138 Merge pull request #2691 from metamx/lookupExtrFn
Add ExtractionFn to LookupExtractor bridge
2016-04-06 09:13:08 -07:00
jon-wei 0e481d6f93 Allow filters to use extraction functions 2016-04-05 13:24:56 -07:00
Fangjin Yang eea7a47870 Merge pull request #2576 from navis/paging-from-next
Add option for select query to get next page without modifying returned paging identifiers
2016-04-01 13:50:36 -07:00
navis.ryu 077522a46f stringFormat extractionFn should be able to return null on null values (Fix for #2706) 2016-04-01 13:40:56 +09:00
navis.ryu 29bb00535b Add option for select query to get next page without modifying returned paging identifiers 2016-04-01 09:03:03 +09:00
Gian Merlino 1853f36e9f More consistent empty-set filtering behavior on multi-value columns.
The behavior is now that filters on "null" will match rows with no
values. The behavior in the past was inconsistent; sometimes these
filters would match and sometimes they wouldn't.

Adds tests for this behavior to SelectorFilterTest and
BoundFilterTest, for query-level filters and filtered aggregates.

Fixes #2750.
2016-03-29 15:32:13 -07:00
Charles Allen 4764e86409 Add docs for RegisteredDimensionExtractionFn 2016-03-28 13:27:49 -07:00
Charles Allen ab324e4ac0 Move lookup docs that are in druid-proper back into lookups.md 2016-03-25 10:46:50 -07:00
Fangjin Yang a5d5529749 Merge pull request #2711 from gianm/filtered-aggregator-impls
All Filters should work with FilteredAggregators.
2016-03-23 13:37:21 -07:00
Gian Merlino dd86198902 All Filters should work with FilteredAggregators.
This removes Filter.makeMatcher(ColumnSelectorFactory) and adds a
ValueMatcherFactory implementation to FilteredAggregatorFactory so it can
take advantage of existing makeMatcher(ValueMatcherFactory) implementations.

This patch also removes the Bound-based method from ValueMatcherFactory. Its
only user was the SpatialFilter, which could use the Predicate-based method.

Fixes #2604.
2016-03-23 12:24:01 -07:00
Gian Merlino 2dfd3877c0 Fix a bunch of broken links in the docs. 2016-03-23 10:21:28 -07:00
Fangjin Yang d1f8f2b2fd Merge pull request #2698 from druid-io/fix-ext-docs
refactor extensions into their own docs
2016-03-22 22:04:12 -07:00
fjy 943cbe6e76 refactor extensions into their own docs 2016-03-22 18:54:10 -07:00
Fangjin Yang 041350c31b Merge pull request #2701 from gianm/mvd-docs
Improved docs for multi-value dimensions.
2016-03-22 18:09:37 -07:00
Gian Merlino 451c0bc6d8 Merge pull request #2702 from pjain1/improve_docs
how to query in the querying section, correct default for select strategy, formatting
2016-03-22 16:40:35 -07:00
Parag Jain 39ecb9929d how to query, correct default for select strategy, formatting 2016-03-22 17:06:15 -05:00
Gian Merlino ff25325f3b Improved docs for multi-value dimensions.
- Add central doc for multi-value dimensions, with some content from other docs.
- Link to multi-value dimension doc from topN and groupBy docs.
- Fixes a broken link from dimensionspecs.md, which was presciently already
  linking to this nonexistent doc.
- Resolve inconsistent naming in docs & code (sometimes "multi-valued", sometimes
  "multi-value") in favor of "multi-value".
2016-03-22 14:40:55 -07:00
Nishant 11b8d1ed70 Merge pull request #2686 from gianm/fix-analysistypes-docs
Fix analysisTypes docs for SegmentMetadataQuery.
2016-03-18 16:15:38 -07:00
Gian Merlino 76ae30604e Fix analysisTypes docs for SegmentMetadataQuery. 2016-03-18 13:17:33 -07:00
Charles Allen 5da9a280b6 Query Time Lookup - Dynamic Configuration 2016-03-18 09:45:05 -07:00
Slim cf342d8d3c Merge pull request #2517 from b-slim/adding_lookup_snapshot_utility
[QTL][Lookup] lookup module with the snapshot utility
2016-03-17 11:39:47 -05:00
Slim Bouguerra 0c86b29ef0 lookup module with the snapshot utility 2016-03-17 09:20:41 -05:00
Fangjin Yang 8cea85816d Merge pull request #2668 from navis/fix-document-selectquery
Document for search query was not updated properly (Fix for #2662)
2016-03-15 20:34:27 -07:00
navis.ryu 71ee9e2aac Document for search query is not updated properly (Fix for #2662) 2016-03-16 09:22:26 +09:00
dclim 553b677971 caching doc fix 2016-03-15 17:09:33 -06:00
Fangjin Yang dbdbacaa18 Merge pull request #2260 from navis/cardinality-for-searchquery
Support cardinality for search query
2016-03-14 13:24:40 -07:00
navis.ryu be341bf4e3 Support cardinality for search query (Fix for #2260) 2016-03-12 09:51:01 +09:00
Himanshu Gupta ca5de3f583 only allow lowering maxResults and maxIntermediateRows from groupBy query context 2016-03-08 15:03:59 -06:00
Himanshu Gupta 099acb4966 allow groupBy max[Intermediate]Rows limit be overridable by context 2016-03-07 15:22:41 -06:00
jon-wei fd3782522c Rename 'replaceMissingValues...' parameters in RegexExtractionFn 2016-02-24 13:12:56 -08:00
Fangjin Yang 083f019a48 Merge pull request #2465 from druid-io/more-doc-fix
more doc fixes
2016-02-17 11:00:38 -08:00
fjy 7da6594bfe more doc fixes 2016-02-17 09:43:47 -08:00
Jonathan Wei d63eec65a1 Merge pull request #2208 from navis/metadataquery-minmax
Support min/max values for metadata query
2016-02-11 17:28:07 -08:00
navis.ryu dd2375477a Support min/max values for metadata query (#2208) 2016-02-12 09:35:58 +09:00
navis.ryu 4d63196535 Support dimension spec for select query 2016-02-12 08:54:28 +09:00
Slim 368988d187 Merge pull request #2291 from druid-io/lookupManager
Promoting LookupExtractor state and LookupExtractorFactory to be a first class druid state object.
2016-02-11 16:07:27 -06:00
Slim Bouguerra 438a4a9970 fix docs about search query limit 2016-02-11 13:20:59 -06:00
Slim Bouguerra 4e119b7a24 Adding lookup ref manager and lookup dimension spec impl 2016-02-11 12:11:51 -06:00
fjy 003f54e268 add doc rendering 2016-02-04 14:21:59 -08:00
fjy 1aa363cea7 new quickstart 2016-02-04 09:37:38 -08:00
navis.ryu 55a888ea2f time-descending result of select queries 2016-01-29 10:06:05 +09:00
Rafael Abbondanza 145c65c72d Updates number of parts in a topN query
This threw me off a bit, so I'm sure it may throw others off, too.
Updating from 10 to 11 parts.
2016-01-25 10:29:25 -05:00
Gian Merlino d416279c14 SegmentMetadataQuery support for returning aggregators. 2016-01-21 17:27:25 -08:00
Gian Merlino 87c8046c6c Add StorageAdapter#getColumnTypeName, and various SegmentMetadataQuery adjustments.
SegmentMetadataQuery stuff:

- Simplify implementation of SegmentAnalyzer.
- Fix type names for realtime complex columns; this used to try to merge a nice type
  name (like "hyperUnique") from mmapped segments with the word "COMPLEX" from incremental
  index segments, leading to a merge failure. Now it always uses the nice name.
- Add hasMultipleValues to ColumnAnalysis.
- Add tests for both mmapped and incremental index segments.
- Update docs to include errorMessage.
2016-01-21 15:50:33 -08:00
Slim Bouguerra 78feb3a13e adding lower and upper extraction fn 2016-01-21 08:59:05 -06:00
zhxiaog 3459a202ce fixed #1873, add ability to express CONCAT as an extractionFn 2016-01-18 15:03:17 -08:00
Keuntae Park 238dd3be3c support cascade execution of extraction filters in extraction dimension spec 2016-01-18 11:10:19 +09:00
navis.ryu 18479bb757 time-descending result of timeseries queries 2016-01-13 12:23:01 +09:00