Commit Graph

564 Commits

Author SHA1 Message Date
Gian Merlino 5048ab3e96 Add metrics to the native queries underpinning SQL. (#4561)
* Add metrics to the native queries underpinning SQL.

This is done by factoring out the metrics and request log emitting
code from QueryResource into a new QueryLifecycle class. That class
is used by both QueryResource and the SQL DruidSchema and QueryMaker.

Also fixes a couple of bugs in QueryResource:

- RequestLogLine start time was set to `TimeUnit.NANOSECONDS.toMillis(startNs)`,
  which is incorrect since absolute nanos cannot be converted to millis.
- DruidMetrics.makeRequestMetrics was called with null `query` on
  unparseable queries, which led to spurious "Unable to log query"
  errors.

Partial fix for #4047.

* Code style

* Remove unused imports.

* Fix tests.

* Remove unused import.
2017-07-24 21:26:27 -07:00
Roman Leventov c0beb78ffd Enforce brace formatting with Checkstyle (#4564) 2017-07-21 10:26:59 -05:00
Gian Merlino 2be7068f6e Fixes and improvements to SQL metadata caching. (#4551)
* Fixes and improvements to SQL metadata caching.

Also adds support for MultipleSpecificSegmentSpec to CachingClusteredClient.

SQL changes:
- Cache metadata on a per-segment level, in addition to per-dataSource, so
  we don't need to re-query all segments whenever a single new one appears.
  This should lower the load placed on the cluster by metadata queries.
- Fix race condition in DruidSchema that can cause us to miss metadata. It was
  possible to notice new segments, then issue a query, and have that query
  not actually hit those segments, and not notice that it didn't hit those segments.
  Then, the metadata from those segments would be ignored.
- Fix assumption in DruidSchema that all segments are immutable. Now, mutable
  segments are periodically re-queried.
- Fix inappropriate re-use of SchemaPlus. Now we create one for each planning
  cycle, rather than sharing one. It caches table objects, which we want to
  avoid, since it can cause stale metadata. We do the caching in DruidSchema
  so we don't need the SchemaPlus caching.

Server changes:
- Add a TimelineCallback to TimelineServerView, for callers that want to get updates
  when the timeline has been modified.
- Change CachingClusteredClient from a QueryRunner to a QuerySegmentWalker. This
  allows it to accept queries that are segment-descriptor-based rather than
  intervals-based. In particular it will now support MultipleSpecificSegmentSpec.

* Fix DruidSchema, and unused imports.

* Remove unused import.

* Fix SqlBenchmark.
2017-07-20 10:14:15 -07:00
Slim 71e7a4c054 Adding double colums supports (#4491)
* add double columns support

* Fix numbers and expected results in UTs

* adding float aggregators

* fix IT expected test results

* fix comments

* more fixes

* fix comp

* fix test

* refactor double and float aggregator factories

* fix

* fix UTs

* fix comments

* clean unused code

* fix more comments

* undo unnecessary changes

* fix null issue

* refactor TopNColumnSelectorStrategyFactory

* fix docs

* refactor NumericTopNColumnSelectorStrategy

* fix return

* fix comments

* handle the null case in DimesionIndexer

* more null fixing

* cosmetic changes
2017-07-20 10:14:14 +03:00
Roman Leventov 60cdf94677 Add PMD and prohibit unnecessary fully qualified class names in code (#4350)
* Add PMD and prohibit unnecessary fully qualified class names in code

* Extra fixes

* Remove extra unnecessary fully-qualified names

* Remove qualifiers

* Remove qualifier
2017-07-17 22:22:29 +09:00
Gian Merlino 16817e408d SQL + Expressions = Best friends forever. (#4360)
* SQL + Expressions = Best friends forever.

- Use expressions as a projection layer for anything that can't be
  expressed using traditional Druid extractionFns. Sometimes they're
  embedded directly (like "expression" filters, builtin aggregators,
  or "expression" post-aggregators). Sometimes they're referenced
  through virtual columns (like dimensionSpecs, which can't innately
  reference functions of more than one column without the virtual
  column layer).
- Add many new functions and operators, taking advantage of the
  expression capability (see the querying/sql.md doc).
- Improve consistency of constant reduction and of casting by
  using Druid expressions for this instead of Calcite's RexExecutor.

* Fix casting bug, and other code review comments.

* Fix docs.
2017-07-07 08:48:26 -07:00
Parag Jain 6e2f78f552 TLS support (#4270) 2017-07-06 17:40:12 -07:00
Roman Leventov 9ae457f7ad Avoid using the default system Locale and printing to System.out in production code (#4409)
* Avoid usages of Default system Locale and printing to System.out or System.err in production code

* Fix Charset in DruidKerberosUtil

* Remove redundant string format in GenericIndexed

* Rename StringUtils.safeFormat() to unimportantSafeFormat(); add StringUtils.format() which fails as well as String.format()

* Fix testSafeFormat()

* More fixes of redundant StringUtils.format() inside ISE

* Rename unimportantSafeFormat() to nonStrictFormat()
2017-06-29 14:06:19 -07:00
Roman Leventov ae900a4934 Update versions to 0.11.0-SNAPSHOT (#4483) 2017-06-28 17:05:58 -07:00
Gian Merlino 22aad08a59 ExpressionPostAggregator: Automatically finalize inputs. (#4406)
* ExpressionPostAggregator: Automatically finalize inputs.

Raw HyperLogLogCollectors and such aren't very useful. When writing
expressions like `x / y` users will expect `x` and `y` to be finalized.

* Fix un-merge.

* Code review comments.

* Remove unnecessary ImmutableMap.copyOf.
2017-06-17 13:22:47 -07:00
Jonathan Wei 3b70995bb3 Configurable row limit for JDBC frames (#4417) 2017-06-16 17:07:40 -07:00
Jonathan Wei cc815eec81 Create/close yielder in same thread for JDBC queries (#4415)
* Create/close yielder in same thread for JDBC queries

* PR comments

* More PR comments

* Add connectionId to DruidStatement executor
2017-06-16 16:50:33 -07:00
Goh Wei Xiang f68a0693f3 Allow use of non-threadsafe ObjectCachingColumnSelectorFactory (#4397)
* Adding a flag to indicate when ObjectCachingColumnSelectorFactory need not be threadsafe.

* - Use of computeIfAbsent over putIfAbsent
- Replace Maps.newXXXMap() with normal instantiation
- Documentations on when is thread-safe required.
- Use Builders for On/OffheapIncrementalIndex

* - Optimization on computeIfAbsent
- Constant EMPTY DimensionsSpec
- Improvement on IncrementalIndexSchema.Builder
  - Remove setting of default values
  - Use var args for metrics
- Correction on On/OffheapIncrementalIndex Builders
- Combine On/OffheapIncrementalIndex Builders

* - Removing unused imports.

* - Helper method for testing with IncrementalIndex.Builder

* - Correction on javadoc.

* Style fix
2017-06-16 16:04:19 -05:00
Gian Merlino e78d8584a1 JettyQosTest, DruidAvaticaHandlerTest: Extend timeout. (#4416)
Fixes #4408, probably.
2017-06-15 18:28:50 -07:00
Gian Merlino 1f2afccdf8 Expressions: Add ExprMacros. (#4365)
* Expressions: Add ExprMacros, which have the same syntax as functions, but
can convert themselves to any kind of Expr at parse-time.

ExprMacroTable is an extension point for adding new ExprMacros. Anything
that might need to parse expressions needs an ExprMacroTable, which can
be injected through Guice.

* Address code review comments.
2017-06-08 09:32:10 -04:00
Gian Merlino 67b162a337 SQL: More forgiving Avatica server. (#4368)
* SQL: More forgiving Avatica server.

- Automatically close statements that are fully iterated or that have
  errors, to prevent dangling statements from causing clients to hit
  open statement limits.
- Empower client auto-reconnects by throwing NoSuchConnectionException
  when appropriate.
- Try to close empty connections when we hit the open connection limit,
  rather than failing the newly opened connection. Client
  auto-reconnections mean this shouldn't cause problems in practice.
- Improve concurrency of the server by making "connections" a
  concurrent map.
- Lower default connection timeout to PT5M from PT30M.

* Fix DruidStatement test.
2017-06-06 10:11:40 -07:00
Jonathan Wei d49e53e6c2 Timeout and maxScatterGatherBytes handling for queries run by Druid SQL (#4305)
* Timeout and maxScatterGatherBytes handling for queries run by Druid SQL

* Address PR comments

* Fix contexts in CalciteQueryTest

* Fix contexts in QuantileSqlAggregatorTest
2017-05-23 16:57:51 +09:00
Jonathan Wei e043bf88ec Add a ServerType for peons (#4295)
* Add a ServerType for peons

* Add toString() method, toString() test, unsupported type check

* Use ServerType enum in DruidServer and DruidServerMetadata
2017-05-22 17:24:59 -05:00
Gian Merlino 8ca7f9410e SQL: Add test for concurrent JDBC queries. (#4290) 2017-05-18 12:25:15 -07:00
Jihoon Son 5c0a7ad2f8 Make realtimes available for loading segments (#4148)
* Add ServerType

* Add realtimes to DruidCluster

* fix test fails

* Add SegmentManager

* Fix equals and hashCode of ServerHolder

* Address comments and add more tests

* Address comments
2017-05-18 10:03:39 -05:00
Roman Leventov b7a52286e8 Make @Override annotation obligatory (#4274)
* Make MissingOverride an error

* Make travis stript to fail fast

* Add missing Override annotations

* Comment
2017-05-16 13:30:30 -05:00
Roman Leventov e09e892477 Refactor QueryRunner to accept QueryPlus: Query + QueryMetrics (part of #3798) (#4184)
* Add QueryPlus. Add QueryRunner.run(QueryPlus, Map) method with default implementation, to replace QueryRunner.run(Query, Map).

* Fix GroupByMergingQueryRunnerV2

* Fix QueryResourceTest

* Expand the comment to Query.run(walker, context)

* Remove legacy version of BySegmentSkippingQueryRunner.doRun()

* Add LegacyApiQueryRunnerTest and be more specific about legacy API removal plans in Druid 0.11 in Javadocs
2017-05-10 12:25:00 -07:00
Himanshu 5a5a2749cd improvements to coordinator lookups management (#3855)
* coordinator lookups mgmt improvements

* revert replaces removal, deprecate it instead

* convert and use older specs stored in db

* more tests and updates

* review comments

* add behavior for 0.10.0 to 0.9.2 downgrade

* incorporating more review comments

* remove explicit lock and use LifecycleLock in LookupReferencesManager. use LifecycleLock in LookupCoordinatorManager as well

* wip on LookupCoordinatorManager

* lifecycle lock

* refactor thread creation into utility method

* more review comments addressed

* support smooth roll back of lookup snapshots from 0.10.0 to 0.9.2

* correctly use LifecycleLock in LookupCoordinatorManager and remove synchronization from start/stop

* run lookup mgmt on leader coordinator only

* wip: changes to do multiple start() and stop() on LookupCoordinatorManager

* lifecycleLock fix usage in LookupReferencesManagerTest

* add LifecycleLock back

* fix license hdr

* some fixes

* make LookupReferencesManager.getAllLookupsState() consistent while still being lockless

* address review comments

* addressing leventov's comments

* address charle's comments

* add IOE.java

* for safety in LookupReferencesManager mainThread check for lifecycle started state on each loop in addition to interrupt

* move thread creation utility method to Execs

* fix names

* add tests for LookupCoordinatorManager.lookupManagementLoop()

* add further tests for figuring out toBeLoaded and toBeDropped on LookupCoordinatorManager

* address leventov comments

* remove LookupsStateWithMap and parameterize LookupsState

* address review comments

* address more review comments

* misc fixes
2017-04-28 08:41:38 -05:00
Gian Merlino 2ca7b00346 Update versions to 0.10.1-SNAPSHOT. (#4191) 2017-04-20 18:12:28 -07:00
Gian Merlino 9f4266fba1 Fix SortCollapseRule when inner order is DESC. (#4157)
* Fix SortCollapseRule when inner order is DESC.

* Remove unused import.
2017-04-12 15:39:45 +05:30
Roman Leventov 15f3a94474 Copy closer into Druid codebase (fixes #3652) (#4153) 2017-04-10 09:38:45 +09:00
Gian Merlino bbb61e638b SQL: Support for another form of filtered aggregator. (#4109)
* SQL: Support for another form of filtered aggregator.

* Fix comment, add test for MAX too.
2017-03-27 15:22:36 -07:00
Gian Merlino 90f9932bd3 SQL: Rule to collapse sort chains. (#4085)
Useful for queries like `SELECT * FROM (...) LIMIT X`, where the inner query
has an order by or limit in it.
2017-03-24 19:20:01 -07:00
Gian Merlino 76c4b6446e SQL: Fix handling of CURRENT_TIMESTAMP and friends in non-UTC timezones. (#4114) 2017-03-24 18:45:23 -07:00
Gian Merlino dd6c0ab509 Add SQL REGEXP_EXTRACT function; add "index" to "regex" extractionFn. (#4055)
* Add SQL REGEXP_EXTRACT function; add "index" to "regex" extractionFn.

* Fix tests.
2017-03-24 17:38:36 -07:00
Jonathan Wei 79f1a1d7f0 Allow float parameters for Bound/Selector/In filters on long columns (#4074)
* Allow float parameters for long filters

* Use BigDecimal intermediate form for string->long conversions

* PR comments

* PR comments
2017-03-23 14:18:05 -07:00
Gian Merlino 64248d31b6 SQL: Groundwork for views. (#3962)
* SQL: Groundwork for views.

They are not actually exposed to users at this point, but enough is there
to have some test cases in CalciteQueryTest.

* Remove unused imports.

* Fix injection problem.
2017-03-20 11:53:11 -07:00
Gian Merlino 403fbae7b1 SQL: Better error handling for HTTP API. (#4053)
* SQL: Better error handling for HTTP API.

* Fix test.
2017-03-15 14:18:00 -04:00
Gian Merlino 3216134f8c SQL: Make row extractions extensible and add one for lookups. (#3991)
This is a reopening of #3989, since that PR was merged to master prematurely
and accidentally.
2017-03-13 21:56:16 -07:00
Gian Merlino bad250fe6d SQL: Support for coercing to DECIMAL. (#4028)
Useful for running queries that involve math of ints and floats, which
Calcite types as decimal.
2017-03-13 16:29:23 -07:00
Gian Merlino af5a4cce3c SQL: Clarify approximate distinct count behavior. (#4000) 2017-03-03 13:42:30 -08:00
Gian Merlino 4a56d7d8a0 SQL: Ability to generate exact distinct count queries. (#3999) 2017-03-03 23:40:36 +05:30
Gian Merlino e63eefd7ff Revert "SQL: Make row extractions extensible and add one for lookups. (#3989)"
The PR was merged to master accidentally.

This reverts commit 23927a3c96.
2017-03-01 17:06:12 -08:00
Jonathan Wei 5fb1638534 Add default configuration for select query 'fromNext' parameter (#3986)
* Add default configuration for select query 'fromNext' parameter

* PR comments

* Fix PagingSpec config injection

* Injection fix for test
2017-03-01 17:05:35 -08:00
Gian Merlino 23927a3c96 SQL: Make row extractions extensible and add one for lookups. (#3989)
* SQL: Make row extractions extensible and add one for lookups.

* Fix QuantileSqlAggregatorTest.
2017-03-01 17:03:43 -08:00
praveev 5ccfdcc48b Fix testDeadlock timeout delay (#3979)
* No more singleton. Reduce iterations

* Granularities

* Fix the delay in the test

* Add license header

* Remove unused imports

* Lot more unused imports from all the rearranging

* CR feedback

* Move javadoc to constructor
2017-02-28 12:51:41 -06:00
praveev c3bf40108d One granularity (#3850)
* Refactor Segment Granularity

* Beginning of one granularity

* Copy the fix for custom periods in segment-grunalrity over here.

* Remove the custom serialization for now.

* Compilation cleanup

* Reformat code

* Fixing unit tests

* Unify to use a single iterable

* Backward compatibility for rolling upgrade

* Minor check style. Cosmetic changes.

* Rename length and millis to duration

* CR feedback

* Minor changes.
2017-02-25 01:02:29 -06:00
Gian Merlino 372b84991c Add virtual columns to timeseries, topN, and groupBy. (#3941)
* Add virtual columns to timeseries, topN, and groupBy.

* Fix GroupByTimeseriesQueryRunnerTest.

* Updates from review comments.
2017-02-22 13:16:48 -08:00
Jihoon Son 7200dce112 Atomic merge buffer acquisition for groupBys (#3939)
* Atomic merge buffer acquisition for groupBys

* documentation

* documentation

* address comments

* address comments

* fix test failure

* Addressed comments

- Add InsufficientResourcesException
- Renamed GroupByQueryBrokerResource to GroupByQueryResource

* addressed comments

* Add takeBatch() to BlockingPool
2017-02-22 14:49:37 -06:00
Gian Merlino ca6053d045 SQL: Resolve column type conflicts in favor of newer segments. (#3930)
* SQL: Resolve column type conflicts in favor of newer segments.

Helps with schema evolution from e.g. long -> float, which is supported
on the query side.

* Take columns from highest timestamp instead of max segment id.

* Fixes and docs.
2017-02-15 17:48:49 -08:00
Gian Merlino 16ef513c7d SQL: Add context and contextual functions to planner. (#3919)
* SQL: Add context and contextual functions to planner.

Added support for context parameters specified as JDBC connection properties
or a JSON object for SQL-over-JSON-over-HTTP.

Also added features that depend on context functionality:

- Added CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP functions.
- Added support for time zones other than UTC via a "timeZone" context.
- Pass down query context to Druid queries too.

Also some bug fixes:

- Fix DATE handling, it was largely done incorrectly before.
- Fix CAST(__time TO DATE) which should do a floor-to-day.
- Fix non-equality comparisons to FLOOR(__time TO X).
- Fix maxQueryCount property.

* Pass down context to nested queries too.
2017-02-15 14:09:14 -08:00
Jihoon Son a459db68b6 Fine grained buffer management for groupby (#3863)
* Fine-grained buffer management for group by queries

* Remove maxQueryCount from GroupByRules

* Fix code style

* Merge master

* Fix compilation failure

* Address comments

* Address comments

- Revert Sequence
- Add isInitialized() to Grouper
- Initialize the grouper in RowBasedGrouperHelper.Accumulator
- Simple refactoring RowBasedGrouperHelper.Accumulator
- Add tests for checking the number of used merge buffers
- Improve docs

* Revert unnecessary changes

* change to visible to testing

* fix misspelling
2017-02-14 12:55:54 -08:00
Himanshu 9dfcf0763a disable javascript execution by default (#3818) 2017-02-13 15:11:18 -08:00
Jonathan Wei ca2b04f0fd Add long/float ColumnSelectorStrategy implementations (#3838)
* Add long/float ColumnSelectorStrategy implementations

* Address PR comments

* Add String strategy with internal dictionary to V2 groupby, remove dict from numeric wrapping selectors, more tests

* PR comments

* Use BaseSingleValueDimensionSelector for long/float wrapping

* remove unused import

* Address PR comments

* PR comments

* PR comments

* More PR comments

* Fix failing calcite histogram subquery tests

* ScanQuery test and comment about isInputRaw

* Add outputType to extractionDimensionSpec, tweak SQL tests

* Fix limit spec optimization for numerics

* Add cardinality sanity checks to TopN

* Fix import from merge

* Add tests for filtered dimension spec outputType

* Address PR comments

* Allow filtered dimspecs on numerics

* More comments
2017-02-08 20:39:29 -08:00
Gian Merlino 12317fd001 Bump version to 0.10.0-SNAPSHOT. (#3913) 2017-02-06 17:54:35 -08:00
DaimonPl 93b71e265e Extract HLL related code to separate module (#3900) 2017-02-03 09:45:11 -08:00
Gian Merlino ac84a3e011 SQL: Add resolution parameter, fix filtering bug with APPROX_QUANTILE (#3868)
* SQL: Add resolution parameter to quantile agg, rename to APPROX_QUANTILE.

* Fix bug with re-use of filtered approximate histogram aggregators.

Also add APPROX_QUANTILE tests for filtering and running on complex columns.
Includes some slight refactoring to allow tests to make DruidTables that
include complex columns.

* Remove unused import
2017-01-25 18:39:26 -08:00
Gian Merlino bb7c496d88 SQL: Use topN for single-dim queries with LIMIT but no ORDER BY. (#3867) 2017-01-20 09:59:28 -08:00
Gian Merlino 9cc3015ddd Bypass Calcite's SemiJoinRule and use our own. (#3843)
This simplifies DruidSemiJoin, which no longer needs to add aggregation back
in. It also allows some more kinds of queries to plan properly, like the one
added in "testTopNFilterJoin".
2017-01-19 19:51:14 -08:00
Gian Merlino d51f5e058d SQL: Ditch CalciteConnection layer and add DruidMeta, extension aggregators. (#3852)
* SQL: Ditch CalciteConnection layer and add DruidMeta, extension aggregators.

Switched from CalciteConnection to Planner, bringing benefits:

- CalciteConnection's JDBC interface no longer sits between the SQL server
  (HTTP/Avatica) and Druid's query layer. Instead, the SQL servers can use
  Druid Sequence objects directly, reducing overhead in the query return path.

- Implemented our own Planner-based Avatica Meta, letting us control
  connection timeouts and connection / statement limits. The previous
  CalciteConnection-based implementation didn't have any limits or timeouts.

- The Planner interface lets us override the operator table, opening up
  SQL language extensions. This patch includes two: APPROX_COUNT_DISTINCT
  in core, and a QUANTILE aggregator in the druid-histogram extension.

Also:

- Added INFORMATION_SCHEMA metadata schema.

- Added tests for Unicode literals and escapes.

* Verify statement is actually open before closing it.

* More detailed INFORMATION_SCHEMA docs.
2017-01-19 16:32:20 -08:00
Gian Merlino b0232b4e40 Replace our AggregateValuesRule with Calcite's. (#3845) 2017-01-12 15:51:50 -06:00
Gian Merlino e86859b228 SQL support for nested groupBys. (#3806)
* SQL support for nested groupBys.

Allows, for example, doing exact count distinct by writing:

  SELECT COUNT(*) FROM (SELECT DISTINCT col FROM druid.foo)

Contrast with approximate count distinct, which is:

  SELECT COUNT(DISTINCT col) FROM druid.foo

* Add deeply-nested groupBy docs, tests, and maxQueryCount config.

* Extract magic constants into statics.

* Rework rules to put preconditions in the "matches" method.
2017-01-11 18:32:53 -08:00
Gian Merlino 76620615a1 Properly respect the enableAvatica and enableJsonOverHttp options. (#3834) 2017-01-11 14:43:34 -06:00
Gian Merlino 3c012305d1 SqlResource: Fix incorrect labeling of aliased columns. (#3829) 2017-01-07 12:33:53 -08:00
Gian Merlino a4f81a6471 Update to Calcite 1.11.0. (#3825) 2017-01-06 14:45:17 -08:00
Gian Merlino 1f35120c7e Downgrade to avatica-server 1.8.0, skip avatica-core. (#3813)
This matches the version bundled by Calcite 1.10.0.
2017-01-03 16:00:37 -08:00
Roman Leventov 33800122ad Don't return leaked Objects back to StupidPool, because this is dangerous. Reuse Cleaners in StupidPool. Make StupidPools named. Add StupidPool.leakedObjectCount(). Minor fixes (#3631) 2016-12-26 00:35:35 -06:00
Gian Merlino ebb4952f3f SQL: Support for descending timeseries. (#3790) 2016-12-19 11:19:15 -08:00
Gian Merlino dd63f54325 Built-in SQL. (#3682) 2016-12-16 17:15:59 -08:00