Commit Graph

8771 Commits

Author SHA1 Message Date
Jonathan Wei 00b0a156e9
Tweak isInvalidRows behavior in HadoopTuningConfig (#6339)
* Tweak isInvalidRows behavior in HadoopTuningConfig

* Fix tests
2018-09-24 16:13:13 -07:00
Alexander Saydakov 93345064b5 HllSketch module (#5712)
* HllSketch module

* updated license and imports

* updated package name

* implemented makeAggregateCombiner()

* removed json marks

* style fix

* added module

* removed unnecessary import, side effect of package renaming

* use TreadLocalRandom

* addressing code review points, mostly formatting and comments

* javadoc

* natural order with nulls

* typo

* factored out raw input value extraction

* singleton

* style fix

* style fix

* use Collections.singletonList instead of Arrays.asList

* suppress warning
2018-09-24 08:41:56 -07:00
Roman Leventov 9a3195e98c Improve interning in SQLMetadataSegmentManager (#6357)
* Improve interning in SQLMetadataSegmentManager

* typo
2018-09-22 13:23:30 -07:00
Jonathan Wei 364bf9d1f9 Fix non org.apache.druid files and add package name checkstyle rule (#6367)
* Fix non org.apache.druid files and add package name checkstyle rule

* PR comment
2018-09-21 17:58:19 -07:00
Clint Wylie 399a5659b2 fix incorrect precondition check in `SupervisorManager.suspendOrResumeSupervisor` (#6364)
This check is reverse from the intention
2018-09-21 17:40:14 -07:00
Jonathan Wei f12ffd19a8
Add Kafka reset instructions for tutorial (#6362) 2018-09-21 14:18:31 -07:00
QiuMM 255214cbe6 correct variable name in KafkaSupervisor (#6354) 2018-09-20 16:22:03 -07:00
Gian Merlino e1c649e906 Add metadata indexes to help with segment allocation. (#6348)
Segment allocation queries can take a long time (10s of seconds) when
you have a lot of segments. Adding these indexes helps greatly.
2018-09-19 15:54:13 -07:00
Jonathan Wei 8972244c68 Mutual TLS support (#6076)
* Mutual TLS support

* Kafka test fixes

* TeamCity fix

* Split integration tests

* Use localhost DOCKER_IP

* Increase server thread count

* Increase SSL handshake timeouts

* Add broken pipe retries, use injected client config params

* PR comments, Rat license check exclusion
2018-09-19 09:56:15 -07:00
Joshua Sun 4fafc2ccc9 fixes race condition in kafkasupervisor (#6304)
* fixes race condition in kafkasupervisor

* async verify checkpoints

* fixes race condition in kafkasupervisor

* replace commonly used methods with variables

* remove countdownlatch import

* reformat

* fixes
2018-09-18 22:37:22 -07:00
Jonathan Wei 2e82edc5e0 More exclusions for Rat license check (#6346) 2018-09-18 20:47:56 -07:00
Slim Bouguerra 028354eea8 Adding licenses and enable apache-rat-plugin. (#6215)
* Adding licenses and enable apache-rat-plugi.

Change-Id: I4685a2d9f1e147855dba69329b286f2d5bee3c18

* restore the copywrite of demo_table and add it to the list of allowed ones

Change-Id: I2a9efde6f4b984bc1ac90483e90d98e71f818a14

* revirew comments

Change-Id: I0256c930b7f9a5bb09b44b5e7a149e6ec48cb0ca

* more fixup

Change-Id: I1355e8a2549e76cd44487abec142be79bec59de2

* align

Change-Id: I70bc47ecb577bdf6b91639dd91b6f5642aa6b02f
2018-09-18 08:39:26 -07:00
Jonathan Wei 609da01882 Fix dictionary ID race condition in IncrementalIndexStorageAdapter (#6340)
Possibly related to https://github.com/apache/incubator-druid/issues/4937

--------

There is currently a race condition in IncrementalIndexStorageAdapter that can lead to exceptions like the following, when running queries with filters on String dimensions that hit realtime tasks: 

```
org.apache.druid.java.util.common.ISE: id[5] >= maxId[5]
	at org.apache.druid.segment.StringDimensionIndexer$1IndexerDimensionSelector.lookupName(StringDimensionIndexer.java:591)
	at org.apache.druid.segment.StringDimensionIndexer$1IndexerDimensionSelector$2.matches(StringDimensionIndexer.java:562)
	at org.apache.druid.segment.incremental.IncrementalIndexStorageAdapter$IncrementalIndexCursor.advance(IncrementalIndexStorageAdapter.java:284)
```

When the `filterMatcher` is created in the constructor of `IncrementalIndexStorageAdapter.IncrementalIndexCursor`, `StringDimensionIndexer.makeDimensionSelector` gets called eventually, which calls:

```
final int maxId = getCardinality();
...

 @Override
  public int getCardinality()
  {
    return dimLookup.size();
  }
```

So `maxId` is set to the size of the dictionary at the time that the `filterMatcher` is created.

However, the `maxRowIndex` which is meant to prevent the Cursor from returning rows that were added after the Cursor was created (see https://github.com/apache/incubator-druid/pull/4049) is set after the `filterMatcher` is created.

If rows with new dictionary values are added after the `filterMatcher` is created but before `maxRowIndex` is set, then it is possible for the Cursor to return rows that contain the new values, which will have `id >= maxId`.

This PR sets `maxRowIndex` before creating the `filterMatcher` to prevent rows with unknown dictionary IDs from being passed to the `filterMatcher`.

-----------

The included test triggers the error with a custom Filter + DruidPredicateFactory.

The DimensionSelector for predicate-based filter matching is created here in `Filters.makeValueMatcher`:

```
  public static ValueMatcher makeValueMatcher(
      final ColumnSelectorFactory columnSelectorFactory,
      final String columnName,
      final DruidPredicateFactory predicateFactory
  )
  {
    final ColumnCapabilities capabilities = columnSelectorFactory.getColumnCapabilities(columnName);

    // This should be folded into the ValueMatcherColumnSelectorStrategy once that can handle LONG typed columns.
    if (capabilities != null && capabilities.getType() == ValueType.LONG) {
      return getLongPredicateMatcher(
          columnSelectorFactory.makeColumnValueSelector(columnName),
          predicateFactory.makeLongPredicate()
      );
    }

    final ColumnSelectorPlus<ValueMatcherColumnSelectorStrategy> selector =
        DimensionHandlerUtils.createColumnSelectorPlus(
            ValueMatcherColumnSelectorStrategyFactory.instance(),
            DefaultDimensionSpec.of(columnName),
            columnSelectorFactory
        );

    return selector.getColumnSelectorStrategy().makeValueMatcher(selector.getSelector(), predicateFactory);
  }
```

The test Filter adds a row to the IncrementalIndex in the test when the predicateFactory creates a new String predicate, after `DimensionHandlerUtils.createColumnSelectorPlus` is called.
2018-09-18 10:43:29 +04:00
Dayue Gao edf0c13807 add a sql option to force user to specify time condition (#6246)
* add a sql option to force user to specify time condition

* rename forceTimeCondition to requireTimeCondition, refine error message
2018-09-17 13:52:24 -07:00
Hongze Zhang 2fac6743d4 Add maxIdleTime option to EventReceiverFirehose (#5997) 2018-09-17 13:50:56 -07:00
QiuMM dabaf4caf8 fix NoClassDefFoundError when using SysMonitor (#6300) 2018-09-14 14:47:15 -07:00
Roman Leventov 0c4bd2b57b Prohibit some Random usage patterns (#6226)
* Prohibit Random usage patterns

* Fix FlattenJSONBenchmarkUtil
2018-09-14 13:35:51 -07:00
QiuMM 288aa4d504 Add missing metadata table information in docs (#6309)
* Add missing metadata table information in doc file

* address review comment
2018-09-14 12:17:05 -07:00
QiuMM 85391e9fb3 fix opentsdb emitter always be running and fail sending tags whose value contains colon (#6251)
* fix opentsdb emitter always be running

* check if emitter started

* add more details about consumeDelay in doc

* fix possible thread unsafe

* fix fail sending tags whose value contain colon
2018-09-14 12:14:15 -07:00
QiuMM 87ccee05f7 Add ability to specify list of task ports and port range (#6263)
* support specify list of task ports

* fix typos

* address comments

* remove druid.indexer.runner.separateIngestionEndpoint config

* tweak doc

* fix doc

* code cleanup

* keep some useful comments
2018-09-13 19:36:04 -07:00
Roman Leventov d50b69e6d4 Prohibit LinkedList (#6112)
* Prohibit LinkedList

* Fix tests

* Fix

* Remove unused import
2018-09-13 18:07:06 -07:00
Jonathan Wei fd6786ac6c Fix endpoint permissions section in basic-security docs (#6331) 2018-09-13 15:23:41 -07:00
Clint Wylie 91a37c692d 'suspend' and 'resume' support for supervisors (kafka indexing service, materialized views) (#6234)
* 'suspend' and 'resume' support for kafka indexing service
changes:
* introduces `SuspendableSupervisorSpec` interface to describe supervisors which support suspend/resume functionality controlled through the `SupervisorManager`, which will gracefully shutdown the supervisor and it's tasks, update it's `SupervisorSpec` with either a suspended or running state, and update with the toggled spec. Spec updates are provided by `SuspendableSupervisorSpec.createSuspendedSpec` and `SuspendableSupervisorSpec.createRunningSpec` respectively.
* `KafkaSupervisorSpec` extends `SuspendableSupervisorSpec` and now supports suspend/resume functionality. The difference in behavior between 'running' and 'suspended' state is whether the supervisor will attempt to ensure that indexing tasks are or are not running respectively. Behavior is identical otherwise.
* `SupervisorResource` now provides `/druid/indexer/v1/supervisor/{id}/suspend` and `/druid/indexer/v1/supervisor/{id}/resume` which are used to suspend/resume suspendable supervisors
* Deprecated `/druid/indexer/v1/supervisor/{id}/shutdown` and moved it's functionality to `/druid/indexer/v1/supervisor/{id}/terminate` since 'shutdown' is ambiguous verbage for something that effectively stops a supervisor forever
* Added ability to get all supervisor specs from `/druid/indexer/v1/supervisor` by supplying the 'full' query parameter `/druid/indexer/v1/supervisor?full` which will return a list of json objects of the form `{"id":<id>, "spec":<SupervisorSpec>}`
* Updated overlord console ui to enable suspend/resume, and changed 'shutdown' to 'terminate'

* move overlord console status to own column in supervisor table so does not look like garbage

* spacing

* padding

* other kind of spacing

* fix rebase fail

* fix more better

* all supervisors now suspendable, updated materialized view supervisor to support suspend, more tests

* fix log
2018-09-13 14:42:18 -07:00
Clint Wylie 96a1076e23 allow 3 retries for failing tests (#6324)
* allow 1 retry for failing tests idk if this is a good idea, but false failure rate due to flaky tests seems pretty bad lately

* try to fix retry issue with teardown

* Update pom.xml

* Update pom.xml
2018-09-11 19:16:59 -07:00
Gian Merlino 7f3a0dae28
ParseSpec: Remove default setting. (#6310)
* ParseSpec: Remove default setting.

Having a default ParseSpec implementation is bad for users, because it masks
problems specifying the format. Two common problems masked by this are specifying
the "format" at the wrong level of the JSON, and specifying a format that
Druid doesn't support. In both cases, having a default implementation means that
users will get the delimited parser rather than an error, and then be confused
when, later on, their data failed to parse.

* Fix integration tests.
2018-09-11 19:16:19 -07:00
Gian Merlino d6cbdf86c2
Broker backpressure. (#6313)
* Broker backpressure.

Adds a new property "druid.broker.http.maxQueuedBytes" and a new context
parameter "maxQueuedBytes". Both represent a maximum number of bytes queued
per query before exerting backpressure on the channel to the data server.

Fixes #4933.

* Fix query context doc.
2018-09-10 09:33:29 -07:00
Gian Merlino 4669f0878f SQL: UNION ALL operator. (#6314)
* SQL: UNION ALL operator.

* Remove unused import.
2018-09-09 22:32:56 -07:00
Clint Wylie e6e068ce60 Add support for 'maxTotalRows' to incremental publishing kafka indexing task and appenderator based realtime task (#6129)
* resolves #5898 by adding maxTotalRows to incremental publishing kafka index task and appenderator based realtime indexing task, as available in IndexTask

* address review comments

* changes due to review

* merge fail
2018-09-07 13:17:49 -07:00
Clint Wylie e095f63e8e fix coordinator console loading (#6276) 2018-09-06 16:59:51 -07:00
Jonathan Wei 60cbc64472
Use PasswordProvider, fix info on initial passwords in basic security extension docs (#6303)
* Fix info on initial passwords in basic security extension docs

* Use PasswordProvider

* Compile fix
2018-09-05 17:07:16 -07:00
Himanshu d61f708ef5 make COMPLEX column optionally filterable in Druid code (#6223)
* make COMPLEX column filterable in Druid code

* Revert "make COMPLEX column filterable in Druid code"

This reverts commit 9fc6ec768c.

* complex columns can be optionally made filterable

* some types are always filterable

* add ColumnCapabilitiesImpl serde tests

* add SuppresedWarnings annotation
2018-09-05 12:28:49 -07:00
Gian Merlino be6c901114 Like filter: Fix escapes escaping themselves. (#6295)
Escapes should escape themselves.
2018-09-05 09:29:07 -07:00
Jonathan Wei 4caa61d8fa Fix tutorial sample data filename, fix logger classname in metrics docs (#6299) 2018-09-04 21:47:12 -07:00
QiuMM 84810f6358 correct metric name in emitter configuration files (#6290) 2018-09-04 14:23:04 -07:00
adursun 71ac3ada21 Fix link related to metadata storage (#6294) 2018-09-04 14:20:57 -07:00
Eyal Yurman 10ca290d64 Correct file name typo in Quickstart tutorial (#6297)
Correct name wikipedia-2015-09-12-sampled.json.gz to wikiticker-2015-09-12-sampled.json.gz
2018-09-04 14:20:17 -07:00
Jonathan Wei 180e3ccfad
Docs consistency cleanup (#6259) 2018-09-04 12:54:41 -07:00
Dayue Gao 743547fc3b Unauthorized sql request should return 403 (#6279) 2018-09-01 09:17:18 -07:00
Jonathan Wei d0fb83760e
Fix PostgreSQLConnectorConfig binding (#6273) 2018-08-31 14:18:29 -07:00
Dayue Gao 951b36e2bc BytesFullResponseHandler should only consume readableBytes of ChannelBuffer (#6270) 2018-08-30 20:22:08 -07:00
QiuMM 9b04846e6b correct metric name in doc file (#6271) 2018-08-30 10:57:35 -07:00
Gian Merlino 431d3d8497
Rename io.druid to org.apache.druid. (#6266)
* Rename io.druid to org.apache.druid.

* Fix META-INF files and remove some benchmark results.

* MonitorsConfig update for metrics package migration.

* Reorder some dimensions in inner queries for some reason.

* Fix protobuf tests.
2018-08-30 09:56:26 -07:00
Himanshu 1fae6513e1 add "subtotalsSpec" attribute to groupBy query (#5280)
* add subtotalsSpec attribute to groupBy query

* dont sent subtotalsSpec to downstream nodes from broker and other updates

* address review comment

* fix checkstyle issues after merge to master

* add docs for subtotalsSpec feature

* address doc review comments
2018-08-28 17:46:38 -07:00
Dayue Gao fcf8c8d53c RowBasedKeySerde should use empty dictionary in constructor (#6256) 2018-08-28 17:22:18 -07:00
Jonathan Wei c9a27e3e8e
Don't let catch/finally suppress main exception in IncrementalPublishingKafkaIndexTaskRunner (#6258) 2018-08-28 16:12:02 -07:00
Gian Merlino 80224df36a SQL: Fix post-aggregator naming logic for sort-project. (#6250)
The old code assumes that post-aggregator prefixes are one character
long followed by numbers. This isn't always true (we may pad with
underscores to avoid conflicts). Instead, the new code uses a different
base prefix for sort-project postaggregators ("s" instead of "p") and
uses the usual Calcites.findUnusedPrefix function to avoid conflicts.
2018-08-28 10:59:32 -07:00
Dayue Gao a879022bc8 fix AssertionError of semi join query (#6244) 2018-08-27 17:49:51 -07:00
Jim Slattery d957295b98 spelling: storage (#6248) 2018-08-27 16:35:31 -07:00
Dayue Gao 2325844a38 fix incorrect check of maxSemiJoinRowsInMemory (#6242) 2018-08-27 16:28:29 -07:00
Gian Merlino 4a8b09b6a9 Fix NPE on constant null numeric expressions. (#6232)
The bug was caused by makeExprEvalSelector returning a null object, which
it isn't supposed to do. Fixed this by renaming ConstantColumnValueSelector
to ConstantExprEvalSelector (it was only used for ExprEval anyway) and
putting logic in that class to make sure the selectors behave as expected.
2018-08-27 15:30:56 -07:00