Commit Graph

238 Commits

Author SHA1 Message Date
Gian Merlino 34a03b8e6c SQL: EXPLAIN improvements. (#4733)
* SQL: EXPLAIN improvements.

- Include query JSON in explain output.
- Fix a bug where semi-joins and nested groupBys were not fully explained.
- Fix a bug where limits were not included in "select" query explanations.

* Fix compile error.

* Fix compile error.

* Fix tests.
2017-09-01 09:35:13 -07:00
Himanshu 4c04083926 kafkaIndexTask unannounce service in final block (#4736) 2017-09-01 09:31:15 -07:00
Charles Allen bdfc6fe25e Move common TypeReference into JacksonUtils (#4738) 2017-08-31 13:40:16 -07:00
Parag Jain 594a66f3c0 add scheme to AsyncQueryForwardingServlet (#4688)
* add scheme to AsyncQueryForwardingServlet

* add sslContext binding for Router
2017-08-28 15:03:43 -07:00
hzy001 4f61dc66a9 Remove the deprecated variable localChildren (#4357)
Signed-off-by: Hao Ziyu <haoziyu@qiyi.com>
2017-08-24 15:27:34 -05:00
Roman Leventov cacf63b007 Add AggregateCombiners (#4676)
* Add MetricCombiners

* Rename MetricCombiner to AggregateCombiner

* Spelling

* Fix TimestampAggregatorFactory.combine() and add makeAggregateCombiner() implementation

* Rename AggregateCombiner.combine() to fold()
2017-08-21 16:45:29 -07:00
Roman Leventov cbd1902db8 Add forbidden-apis plugin; prohibit using system time zone (#4611)
* Forbidden APIs WIP

* Remove some tests

* Restore io.druid.math.expr.Function

* Integration tests fix

* Add comments

* Fix in SimpleWorkerProvisioningStrategy

* Formatting

* Replace String.format() with StringUtils.format() in RemoteTaskRunnerTest

* Address comments

* Fix GroupByMultiSegmentTest
2017-08-21 13:02:42 -07:00
Himanshu 74a64c88ab internal-discovery: interfaces for announcement/discovery, curator based impls (#4634)
* internal-discovery: interfaces for announcement/discovery, curator impls

* more tests

* address some review comments

* more fixes

* address more review comments

* simplify ObjectMapper setup in CuratorDruidNodeAnnouncerAndDiscoveryTest

* fix KafkaIndexTaskTest

* make lookupTier overridable via RealtimeIndexTask and KafkaIndexTask context

* make teamcity build happy
2017-08-16 13:07:16 -07:00
Parag Jain 725a144096 add localhost as advertised hostname (#4689)
* add localhost as advertised hostname

* set advertised.host.name to localhost for test kafka broker
2017-08-14 16:59:26 -07:00
Roman Leventov bf28d0775b Remove QueryRunner.run(Query, responseContext) and related legacy methods (#4482)
* Remove QueryRunner.run(Query, responseContext) and related legacy methods

* Remove local var
2017-08-11 09:12:38 +09:00
Yuewen Wang c821bc9a5a Implement "earlyMessageRejectionPeriod" config discussed in issue #4599 (#4607)
* Implement "earlyMessageRejectionPeriod" config discussed in issue #4599
    * implement the logics of this param
    * Added doc of this config
    * Added unit tests of it

* Update KafkaSupervisor.java

ameliorate comment

* fix format

* fix bug when rebasing
2017-08-11 09:12:08 +09:00
Peter Cunningham ede7cf9eef Added support for where clauses to JDBC lookups. (#4643)
* Added support for where clauses to filter lookup values on ingestion.

Added a filter field to the JDBC lookups that is used to generate a
where clause so that only rows matching the filter value will be
brought into Druid. Example being filter="SOMECOLUMN=1"

* Required changes based on code review.

* Required changes based on code review.

* Added support for where clauses to filter lookup values on ingestion.

Added a filter field to the JDBC lookups that is used to generate a
where clause so that only rows matching the filter value will be
brought into Druid. Example being filter="SOMECOLUMN=1"

* Updates based on code review, mainly formatting and small refactor of
the buildLookupQuery method.

* Fixed broken buildLookupQuery method

* Removed empty line.

* Updates per review comments
2017-08-09 10:47:46 -07:00
Roman Leventov 7454fd86a0 Polymorphic numeric getters for ColumnValueSelector (#4623)
* Add methods getFloat(), getDouble() and getLong() to ColumnValueSelector

* Fix copy-paste mistake in docs

* Spelling
2017-08-08 18:38:06 -07:00
Jihoon Son d5606bc558 Passing lockTimeout as a parameter for TaskLockbox.lock() (#4549)
* Passing lockTimeout as a parameter for TaskLockbox.lock()

* Remove TIME_UNIT

* Fix tc fail

* Add taskLockTimeout to TaskContext

* Add caution
2017-08-08 18:21:07 -07:00
Roman Leventov f5d4171459 Prohibit for loops which could be foreach with IntelliJ (#4653)
* Replace for with foreach

* Replace for with for-each in GroupByQueryEngineV2

* Remove io.druid.collections.IntList
2017-08-08 18:05:33 -07:00
Charles Allen bbe7fb8c46 Better logging for S3DataSegmentPuller `getVersion` (#4657)
* Eventual consistency of S3 means a `404` can be thrown. It helps to know the URI that was attempted.
2017-08-08 16:21:22 +03:00
Roman Leventov aa7e4ae5e4 Enforce correct spacing with Checkstyle (#4651) 2017-08-05 10:18:25 -07:00
Jihoon Son f3f2cd35e1 Array-based aggregation for groupBy query (#4576)
* Array-based aggregation

* Fix handling missing grouping key

* Handle invalid offset

* Fix compilation

* Add cardinality check

* Fix cardinality check

* Address comments

* Address comments

* Address comments

* Address comments

* Cleanup GroupByQueryEngineV2.process

* Change to Byte.SIZE

* Add flatMap
2017-08-03 20:04:54 +03:00
Charles Allen 8921538251 Make AWSCredentialsConfig use PasswordProvider for the string matter (#4613)
* Make AWSCredentialsConfig use PasswordProvider for the string matter
* Fixes https://github.com/druid-io/druid/issues/3911

* Add unit tests
2017-07-29 15:48:49 -07:00
Roman Leventov 5929066dfb Add NamespaceLookupExtractorFactory.toString() (#4606) 2017-07-26 12:02:07 -07:00
Gian Merlino 5048ab3e96 Add metrics to the native queries underpinning SQL. (#4561)
* Add metrics to the native queries underpinning SQL.

This is done by factoring out the metrics and request log emitting
code from QueryResource into a new QueryLifecycle class. That class
is used by both QueryResource and the SQL DruidSchema and QueryMaker.

Also fixes a couple of bugs in QueryResource:

- RequestLogLine start time was set to `TimeUnit.NANOSECONDS.toMillis(startNs)`,
  which is incorrect since absolute nanos cannot be converted to millis.
- DruidMetrics.makeRequestMetrics was called with null `query` on
  unparseable queries, which led to spurious "Unable to log query"
  errors.

Partial fix for #4047.

* Code style

* Remove unused imports.

* Fix tests.

* Remove unused import.
2017-07-24 21:26:27 -07:00
Roman Leventov c0beb78ffd Enforce brace formatting with Checkstyle (#4564) 2017-07-21 10:26:59 -05:00
Gian Merlino 2be7068f6e Fixes and improvements to SQL metadata caching. (#4551)
* Fixes and improvements to SQL metadata caching.

Also adds support for MultipleSpecificSegmentSpec to CachingClusteredClient.

SQL changes:
- Cache metadata on a per-segment level, in addition to per-dataSource, so
  we don't need to re-query all segments whenever a single new one appears.
  This should lower the load placed on the cluster by metadata queries.
- Fix race condition in DruidSchema that can cause us to miss metadata. It was
  possible to notice new segments, then issue a query, and have that query
  not actually hit those segments, and not notice that it didn't hit those segments.
  Then, the metadata from those segments would be ignored.
- Fix assumption in DruidSchema that all segments are immutable. Now, mutable
  segments are periodically re-queried.
- Fix inappropriate re-use of SchemaPlus. Now we create one for each planning
  cycle, rather than sharing one. It caches table objects, which we want to
  avoid, since it can cause stale metadata. We do the caching in DruidSchema
  so we don't need the SchemaPlus caching.

Server changes:
- Add a TimelineCallback to TimelineServerView, for callers that want to get updates
  when the timeline has been modified.
- Change CachingClusteredClient from a QueryRunner to a QuerySegmentWalker. This
  allows it to accept queries that are segment-descriptor-based rather than
  intervals-based. In particular it will now support MultipleSpecificSegmentSpec.

* Fix DruidSchema, and unused imports.

* Remove unused import.

* Fix SqlBenchmark.
2017-07-20 10:14:15 -07:00
Slim 71e7a4c054 Adding double colums supports (#4491)
* add double columns support

* Fix numbers and expected results in UTs

* adding float aggregators

* fix IT expected test results

* fix comments

* more fixes

* fix comp

* fix test

* refactor double and float aggregator factories

* fix

* fix UTs

* fix comments

* clean unused code

* fix more comments

* undo unnecessary changes

* fix null issue

* refactor TopNColumnSelectorStrategyFactory

* fix docs

* refactor NumericTopNColumnSelectorStrategy

* fix return

* fix comments

* handle the null case in DimesionIndexer

* more null fixing

* cosmetic changes
2017-07-20 10:14:14 +03:00
Gian Merlino 441ee56ba9 DataSegmentPusher: Add allowed hadoop property prefixes. (#4562)
* DataSegmentPusher: Add allowed hadoop property prefixes.

* Fix dots.
2017-07-18 10:16:12 -07:00
Roman Leventov 60cdf94677 Add PMD and prohibit unnecessary fully qualified class names in code (#4350)
* Add PMD and prohibit unnecessary fully qualified class names in code

* Extra fixes

* Remove extra unnecessary fully-qualified names

* Remove qualifiers

* Remove qualifier
2017-07-17 22:22:29 +09:00
Chris Gavin 960cb07ea6 Fix some unnecessary use of boxed types and incorrect format strings spotted by lgtm. (#4474)
* Remove some unnecessary use of boxed types.

* Fix some incorrect format strings.

* Enable IDEA's MalformedFormatString inspection.

* Add a Checkstyle check for finding uses of incorrect logging packages.

* Fix some incorrect usages of the metamx logger.

* Bypass incorrect logger Checkstyle check where using the correct logger is not simple.

* Fix some more places where the wrong number of arguments are provided to format strings.

* Suppress `MalformedFormatString` inspection on legacy logging test.

* Use @SuppressWarnings rather than a noinspection suppression comment.

* Fix some more incorrect format strings.

* Suppress some more incorrect format string warnings where the incorrect string is intentional.

* Log the aggregator when closing it fails.

* Remove some unneeded log lines.
2017-07-13 12:15:32 -07:00
Roman Leventov b2865b7c7b Make possible to start Peon without DI loading of any querying-related stuff (#4516)
* Make QueryRunnerFactoryConglomerate injection lazy in TaskToolbox/TaskToolboxFactory

* Extract QueryablePeonModule and add druid.modules.excludeList config

* Typo
2017-07-12 13:18:25 -05:00
Akash Dwivedi a108d05f76 Use GenericIndexed v2 supported read() during deserializeColumn (#4463) 2017-07-11 10:18:25 -05:00
Akash Dwivedi 5f411f14af Timeout for LockAcquireAction (#4461)
* Timeout for LockAcquireAction

* Static inner class.

* Rebase changes.

* makeAlert and throw exception incase of overlapping interval.

* Addressed comments.

* remove unused import.

* Addressed comments
2017-07-11 18:59:32 +09:00
Jihoon Son cc20260078 Early publishing segments in the middle of data ingestion (#4238)
* Early publishing segments in the middle of data ingestion

* Remove unnecessary logs

* Address comments

* Refactoring the patch according to #4292 and address comments

* Set the total shard number of NumberedShardSpec to 0

* refactoring

* Address comments

* Fix tests

* Address comments

* Fix sync problem of committer and retry push only

* Fix doc

* Fix build failure

* Address comments

* Fix compilation failure

* Fix transient test failure
2017-07-10 22:35:36 -07:00
Jihoon Son 8ed25acc15 Fix a bug for CSVParser/DelimitedParser when empty column exists in the header row (#4504)
* Fix a bug when empty column exists in header row

* Address comments
2017-07-07 16:19:25 -07:00
Gian Merlino 16817e408d SQL + Expressions = Best friends forever. (#4360)
* SQL + Expressions = Best friends forever.

- Use expressions as a projection layer for anything that can't be
  expressed using traditional Druid extractionFns. Sometimes they're
  embedded directly (like "expression" filters, builtin aggregators,
  or "expression" post-aggregators). Sometimes they're referenced
  through virtual columns (like dimensionSpecs, which can't innately
  reference functions of more than one column without the virtual
  column layer).
- Add many new functions and operators, taking advantage of the
  expression capability (see the querying/sql.md doc).
- Improve consistency of constant reduction and of casting by
  using Druid expressions for this instead of Calcite's RexExecutor.

* Fix casting bug, and other code review comments.

* Fix docs.
2017-07-07 08:48:26 -07:00
Roman Leventov d168a4271e Use Double.NEGATIVE_INFINITY and Double.POSITIVE_INFINITY (#4496)
* Use Double.NEGATIVE_INFINITY and Double.POSITIVE_INFINITY instead of Double.MIN_VALUE and Double.MAX_VALUE, same for Float

* Replace usages in comments

* Fix RTree

* Remove commented code

* Add tests
2017-07-07 09:10:13 -06:00
Parag Jain 6e2f78f552 TLS support (#4270) 2017-07-06 17:40:12 -07:00
Roman Leventov 9ae457f7ad Avoid using the default system Locale and printing to System.out in production code (#4409)
* Avoid usages of Default system Locale and printing to System.out or System.err in production code

* Fix Charset in DruidKerberosUtil

* Remove redundant string format in GenericIndexed

* Rename StringUtils.safeFormat() to unimportantSafeFormat(); add StringUtils.format() which fails as well as String.format()

* Fix testSafeFormat()

* More fixes of redundant StringUtils.format() inside ISE

* Rename unimportantSafeFormat() to nonStrictFormat()
2017-06-29 14:06:19 -07:00
Roman Leventov ae900a4934 Update versions to 0.11.0-SNAPSHOT (#4483) 2017-06-28 17:05:58 -07:00
Gian Merlino 4c33d0a00f Add some new expression functions and macros. (#4442)
* Add some new expression functions and macros.

See misc/math-expr.md for the list of added functions, except for
"like", which previously existed but was not documented.

* Add easymock to datasketches tests.

* Add easymock to distinctcount tests.

* Add easymock to virtual-columns tests.

* Code review comments.

* Clean up code a bit.

* Add easymock to scan-query tests.

* Rework ExprMacros that have multiple impls.

* Improve test coverage.
2017-06-28 10:15:58 -07:00
Jihoon Son 79fd5338e3 Get s3 objects directly from prefixes when listing is failed due to permission (#4444)
* Fall back to getObject when listing is failed due to permission

* Throws an exception when listing is not allowed on directory

* Fix error messages
2017-06-27 18:58:37 -07:00
Roman Leventov 05d58689ad Remove the ability to create segments in v8 format (#4420)
* Remove ability to create segments in v8 format

* Fix IndexGeneratorJobTest

* Fix parameterized test name in IndexMergerTest

* Remove extra legacy merging stuff

* Remove legacy serializer builders

* Remove ConciseBitmapIndexMergerTest and RoaringBitmapIndexMergerTest
2017-06-26 13:21:39 -07:00
Jihoon Son 5fec619284 Make KafkaLookupExtractorFactoryTest fast (#4466)
* Make KafkaLookupExtractorFactoryTest fast

* Use list

* Use Bytes
2017-06-26 10:15:28 -05:00
Himanshu 61c38b66ad exclude aws-java-sdk from hadoop-aws dep in hdfs-storage module (#4437)
* exclude aws-java-sdk from hdfs-storage module

* address review comments
2017-06-22 15:56:35 -05:00
Goh Wei Xiang f68a0693f3 Allow use of non-threadsafe ObjectCachingColumnSelectorFactory (#4397)
* Adding a flag to indicate when ObjectCachingColumnSelectorFactory need not be threadsafe.

* - Use of computeIfAbsent over putIfAbsent
- Replace Maps.newXXXMap() with normal instantiation
- Documentations on when is thread-safe required.
- Use Builders for On/OffheapIncrementalIndex

* - Optimization on computeIfAbsent
- Constant EMPTY DimensionsSpec
- Improvement on IncrementalIndexSchema.Builder
  - Remove setting of default values
  - Use var args for metrics
- Correction on On/OffheapIncrementalIndex Builders
- Combine On/OffheapIncrementalIndex Builders

* - Removing unused imports.

* - Helper method for testing with IncrementalIndex.Builder

* - Correction on javadoc.

* Style fix
2017-06-16 16:04:19 -05:00
Gian Merlino 17ef785618 Speed up sketch tests by merging fewer indexes. (#4413)
The tests go from 5 minutes to about 10 seconds. 1000 maxRowCount is still
low enough to get a few merges, so we're still exercising that functionality.
2017-06-15 14:47:55 -05:00
Roman Leventov 976492c186 Make PolyBind to fail if property value is not found (fixes #4369) (#4374)
* Make PolyBind to fail if property value is not found

* Fix test

* Add onHeap option in NamespaceExtractionModule

* Add PolyBind.createChoiceWithDefaultNoScope()

* Fix NPE

* Fix

* Configure MetadataStorageProvider option for MySQL, PostgreSQL and SQLServer

* Deprecate PolyBind.createChoiceWithDefault form with unused defaultKey

* Fix NPE
2017-06-13 09:45:43 -07:00
Roman Leventov c121845102 Avoid using Guava in DataSegmentPushers because of incompatibilities (#4391)
* Avoid using Guava in DataSegmentPushers because of Hadoop incompatibilities

* Clarify comments
2017-06-12 09:58:34 -07:00
Roman Leventov 5285eb961b Update dependencies (#4313)
* Update dependencies

* Downgrade curator

* Rollback aws-java-sdk dependency to 1.10.77

* Revert exclusions in integration-tests

* Depend only on aws-java-sdk-ec2 instead of umbrella aws-java-sdk (fixes #4382)
2017-06-09 14:32:07 -07:00
Niketh Sabbineni 2cd91b64d0 Uncompress streams without having to download to tmp first (#4364)
* Uncompress streams without having to download to tmp first

* Remove unused file
2017-06-08 18:08:38 -07:00
Gian Merlino 1f2afccdf8 Expressions: Add ExprMacros. (#4365)
* Expressions: Add ExprMacros, which have the same syntax as functions, but
can convert themselves to any kind of Expr at parse-time.

ExprMacroTable is an extension point for adding new ExprMacros. Anything
that might need to parse expressions needs an ExprMacroTable, which can
be injected through Guice.

* Address code review comments.
2017-06-08 09:32:10 -04:00
Roman Leventov 63a897c278 Enable most IntelliJ 'Probable bugs' inspections (#4353)
* Enable most IntelliJ 'Probable bugs' inspections

* Fix in RemoteTestNG

* Fix IndexSpec's equals() and hashCode() to include longEncoding

* Fix inspection errors

* Extract global isntance of natural().nullsFirst(); address comments

* Fix

* Use noinspection comments instead of SuppressWarnings on method for IntelliJ-specific inspections

* Prohibit Ordering.natural().nullsFirst() using Checkstyle
2017-06-07 09:54:25 -07:00