Commit Graph

8820 Commits

Author SHA1 Message Date
Gian Merlino 3525d4059e
Cache: Add maxEntrySize config, make groupBy cacheable by default. (#5108)
* Cache: Add maxEntrySize config.

The idea is this makes it more feasible to cache query types that
can potentially generate large result sets, like groupBy and select,
without fear of writing too much to the cache per query.

Includes a refactor of cache population code in CachingQueryRunner and
CachingClusteredClient, such that they now use the same CachePopulator
interface with two implementations: one for foreground and one for
background.

The main reason for splitting the foreground / background impls is
that the foreground impl can have a more effective implementation of
maxEntrySize. It can stop retaining subvalues for the cache early.

* Add CachePopulatorStats.

* Fix whitespace.

* Fix docs.

* Fix various tests.

* Add tests.

* Fix tests.

* Better tests

* Remove conflict markers.

* Fix licenses.
2018-08-07 10:23:15 -07:00
Jihoon Son 56ab4363ea
Native parallel batch indexing without shuffle (#5492)
* Native parallel indexing without shuffle

* fix build

* fix ci

* fix ingestion without intervals

* fix retry

* fix retry

* add it test

* use chat handler

* fix build

* add docs

* fix ITUnionQueryTest

* fix failures

* disable metrics reporting

* working

* Fix split of static-s3 firehose

* Add endpoints to supervisor task and a unit test for endpoints

* increase timeout in test

* Added doc

* Address comments

* Fix overlapping locks

* address comments

* Fix static s3 firehose

* Fix test

* fix build

* fix test

* fix typo in docs

* add missing maxBytesInMemory to doc

* address comments

* fix race in test

* fix test

* Rename to ParallelIndexSupervisorTask

* fix teamcity

* address comments

* Fix license

* addressing comments

* addressing comments

* indexTaskClient-based segmentAllocator instead of CountingActionBasedSegmentAllocator

* Fix race in TaskMonitor and move HTTP endpoints to supervisorTask from runner

* Add more javadocs

* use StringUtils.nonStrictFormat for logging

* fix typo and remove unused class

* fix tests

* change package

* fix strict build

* tmp

* Fix overlord api according to the recent change in master

* Fix it test
2018-08-06 23:59:42 -07:00
Clint Wylie 62677212cc Order rows during incremental index persist when rollup is disabled. (#6107)
* order using IncrementalIndexRowComparator at persist time when rollup is disabled, allowing increased effectiveness of dimension compression, resolves #6066

* fix stuff from review
2018-08-06 14:17:48 -07:00
Jihoon Son ef2d6e9118
Fix IllegalArgumentException in TaskLockBox.syncFromStorage() when updating from 0.12.x to 0.12.2 (#6086)
* Fix TaskLockBox.syncFromStorage() when updating from 0.12.x to 0.12.2

* Make the priority of taskLock nullable

* fix test

* fix build
2018-08-03 17:13:44 -07:00
chengchengpei 914058f288 validate baseDataSource non-empty string in DerivativeDataSourceMetad… (#6072)
* validate baseDataSource non-empty string in DerivativeDataSourceMetadata; added a test

* added another test for validating null baseDataSource

* restored the arguments check order
2018-08-03 13:38:18 -07:00
Nishant Bangarwa 75c8a87ce1 Part 2 of changes for SQL Compatible Null Handling (#5958)
* Part 2 of changes for SQL Compatible Null Handling

* Review comments - break lines longer than 120 characters

* review comments

* review comments

* fix license

* fix test failure

* fix CalciteQueryTest failure

* Null Handling - Review comments

* review comments

* review comments

* fix checkstyle

* fix checkstyle

* remove unrelated change

* fix test failure

* fix failing test

* fix travis failures

* Make StringLast and StringFirst aggregators nullable and fix travis failures
2018-08-02 08:20:25 -07:00
Jonathan Wei b9c445c780
Optimize filtered aggs with interval filters in per-segment queries (#5857)
* Optimize per-segment queries

* Always optimize, add unit test

* PR comments

* Only run IntervalDimFilter optimization on __time column

* PR comments

* Checkstyle fix

* Add test for non __time column
2018-08-01 14:39:38 -07:00
Andrés Gómez e270362767 Add stringLast and stringFirst aggregators extension (#5789)
* Add lastString and firstString aggregators extension

* Remove duplicated class

* Move first-last-string doc page to extensions-contrib

* Fix ObjectStrategy compare method

* Fix doc bad aggregatos type name

* Create FoldingAggregatorFactory classes to fix SegmentMetadataQuery

* Add getMaxStringBytes() method to support JSON serialization

* Fix null pointer exception at segment creation phase when the string value is null

* Control the valueSelector object class on BufferAggregators

* Perform all improvements

* Add java doc on SerializablePairLongStringSerde

* Refactor ObjectStraty compare method

* Remove unused ;

* Add aggregateCombiner unit tests. Rename BufferAggregators unit tests

* Remove unused imports

* Add license header

* Add class name to java doc class serde

* Throw exception if value is unsupported class type

* Move first-last-string extension into druid core

* Update druid core docs

* Fix null pointer exception when pair->string is null

* Add null control unit tests

* Remove unused imports

* Add first/last string folding aggregator on AggregatorsModule to support segment metadata query

* Change SerializablePairLongString to extend SerializablePair

* Change vars from public to private

* Convert vars to primitive type

* Clarify compare comment

* Change IllegalStateException to ISE

* Remove TODO comments

* Control possible null pointer exception

* Add @Nullable annotation

* Remove empty line

* Remove unused parameter type

* Improve AggregatorCombiner javadocs

* Add filterNullValues option at StringLast and StringFirst aggregators

* Add filterNullValues option at agg documentation

* Fix checkstyle

* Update header license

* Fix StringFirstAggregatorFactory.VALUE_COMPARATOR

* Fix StringFirstAggregatorCombiner

* Fix if condition at StringFirstAggregateCombiner

* Remove filterNullValues from string first/last aggregators

* Add isReset flag in FirstAggregatorCombiner

* Change Arrays.asList to Collections.singletonList
2018-08-01 10:52:54 -07:00
Clint Wylie 297810e7a4 log correct moved count on balance instead of snapshot of currently moving (#6032) 2018-08-01 03:36:10 -07:00
Roman Leventov 0754d78a2e Prohibit Lists.newArrayList() with a single argument (#6068)
* Prohibit Lists.newArrayList() with a single argument

* Test fixes

* Add Javadoc to Node constructor
2018-07-31 20:09:10 -07:00
Jonathan Wei 91943a24db
Fix CombiningFirehoseFactory with IngestSegmentFirehoseFactory in IndexTask (#6065)
* Fix CombiningFirehoseFactory with IngestSegmentFirehoseFactory in IndexTask

* Make recursive
2018-07-31 15:30:44 -07:00
Benedict Jin ea72907365 Add the '--fail-at-end' option to maven command for 'strictly compiled' part (#6078) 2018-07-31 13:29:25 -07:00
Jonathan Wei bb2fd57c0b
Revert "skip travis on doc only changes" (#6077)
* Revert "Fix a bug in GroupByQueryEngine (#6062)"

This reverts commit f3595c93d9.

* Revert "Add definition of 'NONE' to queryGranularity in ingestion.index doc (#6073)"

This reverts commit 7f89c72932.

* Revert "skip travis on doc only changes (#6061)"

This reverts commit 66af403f7d.
2018-07-31 12:56:57 -07:00
Clint Wylie 20ae8aa626 Fix 'auto' encoded longs + compression serializer (#6045)
* Fix 'auto' encoded longs + compression serializer
Fixes #6044

changes:
* Fixes `VSizeLongSerde` serializers to treat 'close' as 'flush' when used with `BlockLayoutColumnarLongsSerializer`, allowing unwritten values to be flushed to the buffer when the block is compressed
* Add exhaustive unit test that flexes a variety of value sizes, row counts, and compression strategies to catch issues such as these
:

* refactor LongSerializer close to be named flush instead

* revert and just make new serializers per block
2018-07-30 18:35:20 -07:00
Gian Merlino 3aa7017975 Remove some unnecessary task storage internal APIs. (#6058)
* Remove some unnecessary task storage internal APIs.

- Remove MetadataStorageActionHandler's getInactiveStatusesSince and getActiveEntriesWithStatus.
- Remove TaskStorage's getCreatedDateTimeAndDataSource.
- Remove TaskStorageQueryAdapter's getCreatedTime, and getCreatedDateAndDataSource.
- Migrated all callers to getActiveTaskInfo and getCompletedTaskInfo.

This has one side effect: since getActiveTaskInfo (new) warns and continues when it
sees unreadable tasks, but getActiveEntriesWithStatus threw an exception when it
encountered those, it means that after this patch bad tasks will be ignored when
syncing from metadata storage rather than causing an exception to be thrown.

IMO, this is an improvement, since the most likely reason for bad tasks is either:

- A new version introduced an additional validation, and a pre-existing task doesn't
  pass it.
- You are rolling back from a newer version to an older version.

In both cases, I believe you would want to skip tasks that can't be deserialized,
rather than blocking overlord startup.

* Remove unused import.

* Fix formatting.

* Fix formatting.
2018-07-30 18:35:06 -07:00
Roman Leventov f3595c93d9 Fix a bug in GroupByQueryEngine (#6062) 2018-07-30 14:39:38 -07:00
Caroline1000 7f89c72932 Add definition of 'NONE' to queryGranularity in ingestion.index doc (#6073)
* Add meaning of granularity = None to queryGranularity

* Fix format
2018-07-30 14:07:33 -07:00
Clint Wylie 66af403f7d skip travis on doc only changes (#6061)
* skip travis on doc only changes

* more selective ignoring of examples folder
2018-07-30 13:36:49 -07:00
Gian Merlino 63be028cee
CompactionTask: Reject empty intervals on construction. (#6059)
* CompactionTask: Reject empty intervals on construction.

They don't make sense anyway, and it's better to fail fast.

* Switch API.
2018-07-30 08:52:50 -07:00
Gian Merlino c57f4a5db0
FinalizingFieldAccessPostAggregator: Fix serde. (#6067)
Fixes #6063.
2018-07-28 08:44:22 -07:00
Stas Sukhanov b0ecfee1ab Fix ClassNotFoundException in druid-kerberos extension (#4776)
Class org.apache.hadoop.conf.Configuration inside extensions should be used with caution.
By default, the configuration uses the context class loader of the current thread set to the
class loader used to load the application. Because of isolation between the application and
extensions we must explicitely set the class loader to extension class loader to be able
load classes specified in hadoop configuration file.
2018-07-27 16:23:09 -07:00
Benedict Jin 331a0afb98 Remove redundant type parameters and enforce some other style and inspection rules (#5980)
* Various changes about druid-services module

* Patch improvements from reviewer

* Add ToArrayCallWithZeroLengthArrayArgument & ArraysAsListWithZeroOrOneArgument into inspection profile

* Fix ArraysAsListWithZeroOrOneArgument

* Fix conflict

* Fix ToArrayCallWithZeroLengthArrayArgument

* Fix AliEqualsAvoidNull

* Remove blank line

* Remove unused import clauses

* Fix code style in TopNQueryRunnerTest

* Fix conflict

* Don't use Collections.singletonList when converting the type of array type

* Add argLine into maven-surefire-plugin in druid-process module & increase the timeout value for testMoveSegment testcase

* Roll back the latest commit

* Add java.io.File#toURL() into druid-forbidden-apis

* Using Boolean.parseBoolean instead of Boolean.valueOf for CliCoordinator#isOverlord

* Add a new regexp element into stylecode xml file

* Fix style error for new regexp

* Set the level of ArraysAsListWithZeroOrOneArgument as WARNING

* Fix style error for new regexp

* Add option BY_LEVEL for ToArrayCallWithZeroLengthArrayArgument in inspection profile

* Roll back the level as ToArrayCallWithZeroLengthArrayArgument as ERROR

* Add toArray(new Object[0]) regexp into checkstyle config file & fix them

* Set the level of ArraysAsListWithZeroOrOneArgument as ERROR & Roll back the level of ToArrayCallWithZeroLengthArrayArgument as WARNING until Youtrack fix it

* Add a comment for string equals regexp in checkstyle config

* Fix code format

* Add RedundantTypeArguments as ERROR level inspection

* Fix cannot resolve symbol datasource
2018-07-27 16:56:49 -05:00
Surekha 74ae73df42 Add the correct createdTime to waiting tasks (#6060) 2018-07-27 14:56:19 -07:00
Jihoon Son 1524af703d
Fix IllegalArgumentException in TaskLockBox.syncFromStorage() (#6050) 2018-07-27 10:43:32 -07:00
Eyal Yurman 94d6c9a0a5 Remove JDK 7 from build documentation. (#6031)
See issue #6030
2018-07-26 17:05:07 -07:00
kaijianding 7919e4d5df move rangeSet compare into shardspec (#5688) 2018-07-26 14:17:57 -07:00
Diego Costa e4ef753a60 Use Blackhole objects to sink the method return in a benchmak loop (JMH) (#5908)
* Remove the unsafe loop from the benchmark

* Removing unused variables
2018-07-25 12:05:32 -07:00
Jihoon Son 5ee7b0cada Synchronize scheduled poll() calls in SQLMetadataSegmentManager (#6041)
Similar issue to https://github.com/apache/incubator-druid/issues/6028.
2018-07-24 22:57:30 -05:00
Roman Leventov 7d5eb0c21a Synchronize scheduled poll() calls in SQLMetadataRuleManager to prevent flakiness in SqlMetadataRuleManagerTest (#6033) 2018-07-24 12:00:48 -07:00
Benedict Jin 49dca01b56 Add IRC#druid-dev shields.io into README (#6002) 2018-07-21 13:31:35 -07:00
Jonathan Wei 0590293538
Add comment and code tweak to Basic HTTP Authenticator (#6029) 2018-07-20 20:35:14 -07:00
Jonathan Wei efab3b0160 Add concat and textcat SQL functions (#6005) 2018-07-20 11:21:04 -07:00
Jihoon Son b7d42edb0f Check the kafka topic when compacring checkpoints from tasks and the one stored in metastore (#6015) 2018-07-20 11:20:23 -07:00
Surekha 414487a78e Add support to filter on datasource for active tasks (#5998)
* Add support to filter on datasource for active tasks

* Added datasource filter to sql query for active tasks
* Fixed unit tests

* Address PR comments
2018-07-19 16:33:46 -07:00
Jihoon Son 4a2df2b23a Log the full stack trace when an HTTP request fails (#6022) 2018-07-19 12:05:46 -07:00
Gian Merlino cd8ea3da8d
SQL: Add server-wide default time zone config. (#5993)
* SQL: Add server-wide default time zone config.

* Switch API.
2018-07-18 13:12:40 -07:00
Fangjin Yang 2f51a3c673
Update readme (#6011)
* update readme

* add disclaimer
2018-07-17 11:58:50 -07:00
Charles Allen e206f8ca98 Move build badge to https://travis-ci.org/apache/incubator-druid (#6007) 2018-07-17 13:57:33 -05:00
Jihoon Son c48aa74a30 Fix NPE while handling CheckpointNotice in KafkaSupervisor (#5996)
* Fix NPE while handling CheckpointNotice

* fix code style

* Fix test

* fix test

* add a log for creating a new taskGroup

* fix backward compatibility in KafkaIOConfig
2018-07-13 17:14:57 -07:00
Clint Wylie 31c2179fe1 Coordinator fix balancer stuck (#5987)
* this will fix it

* filter destinations to not consider servers already serving segment

* fix it

* cleanup

* fix opposite day in ImmutableDruidServer.equals

* simplify
2018-07-11 20:19:11 -07:00
Caroline1000 5f78a333ad show that flatten will also work with avro extension (#5874)
* show that flatten will also work with avro extension

* fix url
2018-07-11 16:47:03 -07:00
Clint Wylie ac194cc082 Coordinator fix exception caused by additional logging (#5988)
* fix explosion in curator load queue peon caused by additional logging, as well as annoying chatty log

* remove log message
2018-07-11 16:13:32 -07:00
Caroline1000 153eb26262 fix link to query-context in broker config doc (#5995) 2018-07-11 15:57:08 -07:00
Gian Merlino 04ea3c9f8c
Update license headers. (#5976)
* Update license headers.

For compliance with http://www.apache.org/legal/src-headers.html.

* More license adjustments.

* Fix mistakenly edited package line.
2018-07-11 09:55:18 -07:00
Gian Merlino 948e73da77 Extend various test timeouts. (#5978)
False failures on Travis due to spurious timeout (in turn due to noisy
neighbors) is a bigger problem than legitimate failures taking too long
to time out. So it makes sense to extend timeouts.
2018-07-10 13:02:14 -07:00
Benedict Jin b3021ec802 Fix bug in SegmentAnalyzer.analyzeComplexColumn() #5939 (#5954) 2018-07-09 15:36:16 -07:00
Surekha 441c9819d9 Support limit for timeseries query (#5894) (#5931)
* Support limit for timeseries query (#5894)

* Fix tests

* Address PR comments

* Try to fix teamcity inspection checks

* Remove unused method from VirtualColumns

* Remove unused import statement
2018-07-09 08:58:42 -07:00
Jihoon Son d1d9358274 Increase timeout for BlockingPoolTest (#5959) 2018-07-06 16:34:53 -07:00
Caroline1000 b3976050ad add definition of balancerComputeThreads (#5865) 2018-07-05 09:54:36 -07:00
Caroline1000 ee4a5aafb0 add config values for GCS deep storage (#5875)
* add config values for GCS deep storage

* fix config values for GCS deep storage
2018-07-05 09:53:41 -07:00