Commit Graph

108 Commits

Author SHA1 Message Date
Jihoon Son 241efafbb2
Automatic compaction by coordinators (#5102)
* Automatic compaction by coordinator

* add links

* skip compaction for very recent segments if they are small

* fix finding search interval

* fix finding search interval

* fix TimelineHolder iteration

* add test for newestSegmentFirstPolicy

* add CompactionSegmentIterator

* add numTargetCompactionSegments

* add missing config

* fix skipping huge shards

* fix handling large number of segments per shard

* fix test failure

* change recursive call to loop

* fix logging

* fix build

* fix test failure

* address comments

* change dataSources type

* check running pendingTasks at each run

* fix test

* address comments

* fix build

* fix test

* address comments

* address comments

* add doc for segment size optimization

* address comment
2018-01-13 13:52:37 +09:00
Roman Leventov 8877ce38d6
Enforce modifier order with Checkstyle (#5246) 2018-01-11 09:50:42 +01:00
Parag Jain 83c6c48bed Fix state check bug in Kafka Index Task (#5204)
* fix state check for replacement task

* fix comments

* rebase with master
2018-01-08 18:01:36 -08:00
Jonathan Wei cdd374a417 Throw away rows with timestamps beyond long bounds in kafka indexing (#5215)
* Throw away rows with timestamps beyond long bounds in kafka indexing

* PR comments
2018-01-08 17:40:50 -06:00
Roman Leventov 579f9fbedf Add IndexedInts.debugToString() and AbstractIndex.toString(); Add Sequence.toList() and limit() (#5175)
* Add IndexedInts.debugToString() and AbstractIndex.toString()

* Fix AppenderatorTest
2018-01-04 09:56:47 +09:00
Jihoon Son 9199d61389 Automatic pendingSegments cleanup (#5149)
* PendingSegments cleanup

* fix build

* address comments

* address comments

* fix potential npe

* address comments

* fix build

* fix test

* fix test
2017-12-20 14:46:34 -08:00
Parag Jain c56a9807d4
prevent npe on mismatch between number of kafka partitions and task count (#5139) 2017-12-20 16:23:15 -06:00
Roman Leventov 5787d04fad Bump Druid version to 0.12.0 (#5138) 2017-12-15 07:37:01 -08:00
Parag Jain 677e24b760 prevent NPE from supressing actual exception (#5146) 2017-12-12 11:42:30 -08:00
Roman Leventov 64848c7ebf DataSegment memory optimizations (#5094)
* Deduplicate DataSegments contents (loadSpec's keys, dimensions and metrics lists as a whole) more aggressively; use ArrayMap instead of default LinkedHashMap for DataSegment.loadSpec, because they have only 3 entries on average; prune DataSegment.loadSpec on brokers

* Fix DataSegmentTest

* Refinements

* Try to fix

* Fix the second DataSegmentTest

* Nullability

* Fix tests

* Fix tests, unify to use TestHelper.getJsonMapper()

* Revert TestUtil as ServerTestHelper, fix tests

* Add newline

* Fix indexing tests

* Fix s3 tests

* Try to fix tests, remove lazy caching of ObjectMapper in TestHelper, rename TestHelper.getJsonMapper() to makeJsonMapper()

* Fix HDFS tests

* Fix HdfsDataSegmentPusherTest

* Capitalize constant names
2017-12-12 11:41:40 -08:00
Roman Leventov a7a6a0487e Replace IOPeon with SegmentWriteOutMedium; Improve buffer compression (#4762)
* Replace IOPeon with OutputMedium; Improve compression

* Fix test

* Cleanup CompressionStrategy

* Javadocs

* Add OutputBytesTest

* Address comments

* Random access in OutputBytes and GenericIndexedWriter

* Fix bugs

* Fixes

* Test OutputBytes.readFully()

* Address comments

* Rename OutputMedium to SegmentWriteOutMedium and OutputBytes to WriteOutBytes

* Add comments to ByteBufferInputStream

* Remove unused declarations
2017-12-04 18:04:27 -08:00
Parag Jain 7c01f77b04 Parse Batch support (#5081)
* add parseBatch and deprecate parse method in InputRowParser

add addAll method, skip max rows in memory check for it

remove parse method from implemetations

transform transformers

add string multiplier input row parser

fix withParseSpec

fix kafka batch indexing

fix isPersistRequired

comments

* add unit test

* make persist async

* review comments
2017-12-04 16:06:16 -06:00
Parag Jain cb03efeb14 Kafka Index Task that supports Incremental handoffs (#4815)
* Kafka Index Task that supports Incremental handoffs
- Incrementally handoff segments when they hit maxRowsPerSegment limit
- Decouple segment partitioning from Kafka partitioning, all records from consumed partitions go to a single druid segment
- Support for restoring task on middle manager restarts by check pointing end offsets for segments

* take care of review comments

* make getCurrentOffsets call async, keep track of publishing sequence, review comments

* fix setEndoffset duplicate request handling, formatting

* fix unit test

* backward compatibility

* make AppenderatorDriverMetadata backwards compatible

* add unit test

* fix deadlock between persist and push executors in AppenderatorImpl

* fix formatting

* use persist dir instead of work dir

* review comments

* fix deadlock

* actually fix deadlock
2017-11-17 16:05:20 -06:00
Gian Merlino 5da0241ac8
Kafka: Fixes needlessly low interpretation of maxRowsInMemory. (#5034)
AppenderatorImpl already applies maxRowsInMemory across all sinks. So dividing by
the number of Kafka partitions is pointless and effectively makes the interpretation
of maxRowsInMemory lower than expected.

This undoes one of the two changes from #3284, which fixed the original bug twice.
In this, that's worse than fixing it once.
2017-11-02 13:45:04 -06:00
Gian Merlino 6c725a7e06 Fix havingSpec on complex aggregators. (#5024)
* Fix havingSpec on complex aggregators.

- Uses the technique from #4883 on DimFilterHavingSpec too.
- Also uses Transformers from #4890, necessitating a move of that and other
  related classes from druid-server to druid-processing. They probably make
  more sense there anyway.
- Adds a SQL query test.

Fixes #4957.

* Remove unused import.
2017-11-01 12:58:08 -04:00
Gian Merlino 0ce406bdf1
Introduce "transformSpec" at ingest-time. (#4890)
* Introduce "transformSpec" at ingest-time.

It accepts a "filter" (standard query filter object) and "transforms" (a
list of objects with "name" and "expression"). These can be used to do
filtering and single-row transforms without need for a separate data
processing job.

The "expression" fields use the same expression language as other
expression-based feature.

* Remove forbidden api.

* Fix compile error.

* Fix tests.

* Some more changes.

- Add nullable annotation to Firehose.nextRow.
- Add tests for index task, realtime task, kafka task, hadoop mapper,
  and ingestSegment firehose.

* Fix bad merge.

* Adjust imports.

* Adjust whitespace.

* Make Transform into an interface.

* Add missing annotation.

* Switch logger.

* Switch logger.

* Adjust test.

* Adjustment to handling for DatasourceIngestionSpec.

* Fix test.

* CR comments.

* Remove unused method.

* Add javadocs.

* More javadocs, and always decorate.

* Fix bug in TransformingStringInputRowParser.

* Fix bad merge.

* Fix ISFF tests.

* Fix DORC test.
2017-10-30 17:38:52 -07:00
elloooooo 52a162e302 define earlyMessegeRejectPeriod as the period after the taskduration (#4990) 2017-10-27 01:13:46 +05:30
Jihoon Son 8d9902831e Refactoring PrefetchableTextFilesFirehoseFactory (#4836)
* Refactoring prefetchable firehose

* Fix to read cache when prefetch is disabled

* More tests

* Cleanup codes

* Add Fetcher

* Fix test failure

* Count file size

* Fix test

* rename generic parameter

* address comments

* address comments

* reuse buffer

* move Execs to java-util

* use execs

* Fix build
2017-10-13 21:39:28 -05:00
Jihoon Son dfa9cdc982 Prioritized locking (#4550)
* Implementation of prioritized locking

* Fix build failure

* Fix tc fail

* Fix typos

* Fix IndexTaskTest

* Addressed comments

* Fix test

* Fix spacing

* Fix build error

* Fix build error

* Add lock status

* Cleanup suspicious method

* Add nullables

*  add doInCriticalSection to TaskLockBox and revert return type of task actions

* fix build

* refactor CriticalAction

* make replaceLock transactional

* fix formatting

* fix javadoc

* fix build
2017-10-11 23:16:31 -07:00
Jihoon Son 56fb11ce0b Lazy initialization for JavaScript functions (#4871)
* Lazy initialization of JavaScript functions

* Fix test failure

* Fix thread-safety and postpone js conf check

* Fix test fail

* Fix test

* Fix KafkaIndexTaskTest

* Move config check
2017-10-10 21:52:42 -07:00
Gian Merlino 1f2074c247 Bump versions in master to 0.11.1-SNAPSHOT. (#4878)
* Bump versions in master to 0.11.1-SNAPSHOT.

* Missed a few.
2017-09-28 17:09:51 -05:00
Gian Merlino bf8fd4c203 Add flattenSpec support to the Avro parser. (#4832)
* Add flattenSpec support to the Avro parser.

Also:

- Refactor the JSONPathParser a bit so it can share flattening code
  with Avro (see ObjectFlatteners).
- Remove the JSONParser. It was only used in two places: by
  UriNamespaceExtractor, and as a base for JSONToLowerParser. Migrated
  the former to JSONPathParser and made the latter a standalone.
- Move GenericRecordAsMap to the Parquet extension, since the Avro
  extension no longer uses it.

* Fix indentation.

* Fix equals/hashCode.
2017-09-26 09:26:06 -07:00
Parag Jain 07446ef32c warn if topic not found (#4834) 2017-09-25 12:21:46 +09:00
Roman Leventov e267f3901b Enforce Indentation with Checkstyle (#4799) 2017-09-21 13:06:48 -07:00
Jonathan Wei c2a0e753b6 Extension points for authentication/authorization (#4271)
* Extension points for authentication/authorization

* Address some PR comments

* Authorization result caching

* Add unit tests for SecuritySanityCheckFilter and PreResponseAuthorizationCheckFilter

* Use Set for auth caching, close outputstreams in filters

* Don't close output stream on success in sanity check filter

* Add ConfigResourceFilter to coordinator lookups

* Fix filtering authorization check for empty resource list

* HttpClient users must explicitly escalate the client

* Remove response modification from PreResponseAuthorizationCheckFilter

* Remove extraneous pom.xml

* Fix unit test

* Better lifecycle management

* Rename AuthorizationManager to Authorizer

* Fix authorization denials for empty supervisor list

* Address some PR comments

* Address more PR comments

* Small cleanup

* Add Jetty HttpClient wrapper to Authenticator

* Remove Authorizer start/stop

* Restore immutable context map in DruidConnection, UT fix

* Fix/update docs

* Add authorization checks to EventReceiverFirehose

* Fix router authorization check failure, restore PreResponseAuthorizationFilter changes

* Compile fixes

* Test fixes

* Update Authenticator/Authorizer doc comments

* Merge fixes

* PR comments

* Fix test

* Fix IT

* More PR comments

* PR comments

* SSL fix
2017-09-15 23:45:48 -07:00
Himanshu 4c04083926 kafkaIndexTask unannounce service in final block (#4736) 2017-09-01 09:31:15 -07:00
Roman Leventov cbd1902db8 Add forbidden-apis plugin; prohibit using system time zone (#4611)
* Forbidden APIs WIP

* Remove some tests

* Restore io.druid.math.expr.Function

* Integration tests fix

* Add comments

* Fix in SimpleWorkerProvisioningStrategy

* Formatting

* Replace String.format() with StringUtils.format() in RemoteTaskRunnerTest

* Address comments

* Fix GroupByMultiSegmentTest
2017-08-21 13:02:42 -07:00
Himanshu 74a64c88ab internal-discovery: interfaces for announcement/discovery, curator based impls (#4634)
* internal-discovery: interfaces for announcement/discovery, curator impls

* more tests

* address some review comments

* more fixes

* address more review comments

* simplify ObjectMapper setup in CuratorDruidNodeAnnouncerAndDiscoveryTest

* fix KafkaIndexTaskTest

* make lookupTier overridable via RealtimeIndexTask and KafkaIndexTask context

* make teamcity build happy
2017-08-16 13:07:16 -07:00
Parag Jain 725a144096 add localhost as advertised hostname (#4689)
* add localhost as advertised hostname

* set advertised.host.name to localhost for test kafka broker
2017-08-14 16:59:26 -07:00
Roman Leventov bf28d0775b Remove QueryRunner.run(Query, responseContext) and related legacy methods (#4482)
* Remove QueryRunner.run(Query, responseContext) and related legacy methods

* Remove local var
2017-08-11 09:12:38 +09:00
Yuewen Wang c821bc9a5a Implement "earlyMessageRejectionPeriod" config discussed in issue #4599 (#4607)
* Implement "earlyMessageRejectionPeriod" config discussed in issue #4599
    * implement the logics of this param
    * Added doc of this config
    * Added unit tests of it

* Update KafkaSupervisor.java

ameliorate comment

* fix format

* fix bug when rebasing
2017-08-11 09:12:08 +09:00
Jihoon Son d5606bc558 Passing lockTimeout as a parameter for TaskLockbox.lock() (#4549)
* Passing lockTimeout as a parameter for TaskLockbox.lock()

* Remove TIME_UNIT

* Fix tc fail

* Add taskLockTimeout to TaskContext

* Add caution
2017-08-08 18:21:07 -07:00
Roman Leventov f5d4171459 Prohibit for loops which could be foreach with IntelliJ (#4653)
* Replace for with foreach

* Replace for with for-each in GroupByQueryEngineV2

* Remove io.druid.collections.IntList
2017-08-08 18:05:33 -07:00
Roman Leventov aa7e4ae5e4 Enforce correct spacing with Checkstyle (#4651) 2017-08-05 10:18:25 -07:00
Roman Leventov c0beb78ffd Enforce brace formatting with Checkstyle (#4564) 2017-07-21 10:26:59 -05:00
Chris Gavin 960cb07ea6 Fix some unnecessary use of boxed types and incorrect format strings spotted by lgtm. (#4474)
* Remove some unnecessary use of boxed types.

* Fix some incorrect format strings.

* Enable IDEA's MalformedFormatString inspection.

* Add a Checkstyle check for finding uses of incorrect logging packages.

* Fix some incorrect usages of the metamx logger.

* Bypass incorrect logger Checkstyle check where using the correct logger is not simple.

* Fix some more places where the wrong number of arguments are provided to format strings.

* Suppress `MalformedFormatString` inspection on legacy logging test.

* Use @SuppressWarnings rather than a noinspection suppression comment.

* Fix some more incorrect format strings.

* Suppress some more incorrect format string warnings where the incorrect string is intentional.

* Log the aggregator when closing it fails.

* Remove some unneeded log lines.
2017-07-13 12:15:32 -07:00
Roman Leventov b2865b7c7b Make possible to start Peon without DI loading of any querying-related stuff (#4516)
* Make QueryRunnerFactoryConglomerate injection lazy in TaskToolbox/TaskToolboxFactory

* Extract QueryablePeonModule and add druid.modules.excludeList config

* Typo
2017-07-12 13:18:25 -05:00
Akash Dwivedi 5f411f14af Timeout for LockAcquireAction (#4461)
* Timeout for LockAcquireAction

* Static inner class.

* Rebase changes.

* makeAlert and throw exception incase of overlapping interval.

* Addressed comments.

* remove unused import.

* Addressed comments
2017-07-11 18:59:32 +09:00
Jihoon Son cc20260078 Early publishing segments in the middle of data ingestion (#4238)
* Early publishing segments in the middle of data ingestion

* Remove unnecessary logs

* Address comments

* Refactoring the patch according to #4292 and address comments

* Set the total shard number of NumberedShardSpec to 0

* refactoring

* Address comments

* Fix tests

* Address comments

* Fix sync problem of committer and retry push only

* Fix doc

* Fix build failure

* Address comments

* Fix compilation failure

* Fix transient test failure
2017-07-10 22:35:36 -07:00
Parag Jain 6e2f78f552 TLS support (#4270) 2017-07-06 17:40:12 -07:00
Roman Leventov 9ae457f7ad Avoid using the default system Locale and printing to System.out in production code (#4409)
* Avoid usages of Default system Locale and printing to System.out or System.err in production code

* Fix Charset in DruidKerberosUtil

* Remove redundant string format in GenericIndexed

* Rename StringUtils.safeFormat() to unimportantSafeFormat(); add StringUtils.format() which fails as well as String.format()

* Fix testSafeFormat()

* More fixes of redundant StringUtils.format() inside ISE

* Rename unimportantSafeFormat() to nonStrictFormat()
2017-06-29 14:06:19 -07:00
Roman Leventov ae900a4934 Update versions to 0.11.0-SNAPSHOT (#4483) 2017-06-28 17:05:58 -07:00
Roman Leventov 05d58689ad Remove the ability to create segments in v8 format (#4420)
* Remove ability to create segments in v8 format

* Fix IndexGeneratorJobTest

* Fix parameterized test name in IndexMergerTest

* Remove extra legacy merging stuff

* Remove legacy serializer builders

* Remove ConciseBitmapIndexMergerTest and RoaringBitmapIndexMergerTest
2017-06-26 13:21:39 -07:00
Roman Leventov 5285eb961b Update dependencies (#4313)
* Update dependencies

* Downgrade curator

* Rollback aws-java-sdk dependency to 1.10.77

* Revert exclusions in integration-tests

* Depend only on aws-java-sdk-ec2 instead of umbrella aws-java-sdk (fixes #4382)
2017-06-09 14:32:07 -07:00
Gian Merlino 1f2afccdf8 Expressions: Add ExprMacros. (#4365)
* Expressions: Add ExprMacros, which have the same syntax as functions, but
can convert themselves to any kind of Expr at parse-time.

ExprMacroTable is an extension point for adding new ExprMacros. Anything
that might need to parse expressions needs an ExprMacroTable, which can
be injected through Guice.

* Address code review comments.
2017-06-08 09:32:10 -04:00
Roman Leventov 31d33b333e Make using implicit system Charset an error (#4326)
* Make using implicit system charset an error

* Use StringUtils.toUtf8() and fromUtf8() instead of String.getBytes() and new String()

* Use English locale in StringUtils.safeFormat()

* Restore comment
2017-06-05 23:57:25 -07:00
David Lim 13ecf90923 Report Kafka lag information in supervisor status report (#4314)
* refactor lag reporting and report lag at status endpoint

* refactor offset reporting logic to fetch offsets periodically vs. at request time

* remove JavaCompatUtils

* code review changes

* code review changes
2017-06-05 13:26:25 -07:00
Jihoon Son da32e1ae53 Reducing testing time for KafkaIndexTaskTest and KafkaSupervisorTest (#4352) 2017-06-03 00:53:07 +09:00
Jihoon Son f876246af7 Rename FiniteAppenderatorDriver to AppenderatorDriver (#4356) 2017-06-03 00:48:44 +09:00
Jihoon Son 1150bf7a2c Refactoring Appenderator Driver (#4292)
* Refactoring Appenderator

1) Added publishExecutor and handoffExecutor for background publishing and handing segments off
2) Change add() to not move segments out in it

* Address comments

1) Remove publishTimeout for KafkaIndexTask
2) Simplifying registerHandoff()
3) Add increamental handoff test

* Remove unused variable

* Add persist() to Appenderator and more tests for AppenderatorDriver

* Remove unused imports

* Fix strict build

* Address comments
2017-06-02 07:09:11 +09:00