Commit Graph

149 Commits

Author SHA1 Message Date
Jihoon Son e87d6e32b3 Support kinesis compatibility (#7351) 2019-04-08 19:19:34 -07:00
Jihoon Son 5294277cb4
Fix exclusive start partitions for sequenceMetadata (#7339)
* Fix exclusvie start partitions for sequenceMetadata

* add empty check
2019-03-26 14:39:07 -07:00
Jihoon Son 0c5dcf5586 Fix exclusivity for start offset in kinesis indexing service & check exclusivity properly in IndexerSQLMetadataStorageCoordinator (#7291)
* Fix exclusivity for start offset in kinesis indexing service

* some adjustment

* Fix SeekableStreamDataSourceMetadata

* Add missing javadocs

* Add missing comments and unit test

* fix SeekableStreamStartSequenceNumbers.plus and add comments

* remove extra exclusivePartitions in KafkaIOConfig and fix downgrade issue

* Add javadocs

* fix compilation

* fix test

* remove unused variable
2019-03-21 13:12:22 -07:00
Surekha e170203876 Consolidate kafka consumer configs (#7249)
* Consolidate kafka consumer configs

* change the order of adding properties

* Add consumer properties to fix test

it seems kafka consumer does not reveive any message without these configs

* Use KafkaConsumerConfigs in integration test

* Update zookeeper and kafka versions in the setup.sh for the base druid image

*  use version 0.2 of base druid image

* Try to fix tests in KafkaRecordSupplierTest

* unused import

* Fix tests in KafkaSupervisorTest
2019-03-21 11:19:49 -07:00
Gian Merlino a8c7132482 Logic adjustments to SeekableStreamIndexTaskRunner. (#7267)
* Logic adjustments to SeekableStreamIndexTaskRunner.

A mix of simplifications and bug fixes. They are intermingled because
some of the bugs were made difficult to fix, and also more likely to
happen in the first place, by how the code was structured. I tried to
keep restructuring to a minimum. The changes are:

- Remove "initialOffsetsSnapshot", which was used to determine when to
  skip start offsets. Replace it with "lastReadOffsets", which I hope
  is more intuitive. (There is a connection: start offsets must be
  skipped if and only if they have already been read, either by a
  previous task or by a previous sequence in the same task, post-restoring.)
- Remove "isStartingSequenceOffsetsExclusive", because it should always
  be the opposite of isEndOffsetExclusive. The reason is that starts are
  exclusive exactly when the prior ends are inclusive: they must match
  up in that way for adjacent reads to link up properly.
- Don't call "seekToStartingSequence" after the initial seek. There is
  no reason to, since we expect to read continuous message streams
  throughout the task. And calling it makes offset-tracking logic
  trickier, so better to avoid the need for trickiness. I believe the
  call being here was causing a bug in Kinesis ingestion where a
  message might get double-read.
- Remove the "continue" calls in the main read loop. They are bad
  because they prevent keeping currOffsets and lastReadOffsets up to
  date, and prevent us from detecting that we have finished reading.
- Rework "verifyInitialRecordAndSkipExclusivePartition" into
  "verifyRecordInRange". It no longer has side effects. It does a sanity
  check on the message offset and also makes sure that it is not past
  the endOffsets.
- Rework "assignPartitions" to replace inline comparisons with
  "isRecordAlreadyRead" and "isMoreToReadBeforeReadingRecord" calls. I
  believe this fixes an off-by-one error with Kinesis where the last
  record would not get read. It also makes the logic easier to read.
- When doing the final publish, only adjust end offsets of the final
  sequence, rather than potentially adjusting any unpublished sequence.
  Adjusting sequences other than the last one is a mistake since it
  will extend their endOffsets beyond what they actually read. (I'm not
  sure if this was an issue in practice, since I'm not sure if real
  world situations would have more than one unpublished sequence.)
- Rename "isEndSequenceOffsetsExclusive" to "isEndOffsetExclusive". It's
  shorter and more clear, I think.
- Add equals/hashCode/toString methods to OrderedSequenceNumber.

Kafka test changes:

- Added a Kafka "testRestoreAtEndOffset" test to verify that restores at
  the very end of the task lifecycle still work properly.

Kinesis test changes:

- Renamed "testRunOnNothing" to "testRunOnSingletonRange". I think that
  given Kinesis semantics, the right behavior when start offset equals
  end offset (and there aren't exclusive partitions set) is to read that
  single offset. This is because they are both meant to be treated as
  inclusive.
- Adjusted "testRestoreAfterPersistingSequences" to expect one more
  message read. I believe the old test was wrong; it expected the task
  not to read message number 5.
- Adjusted "testRunContextSequenceAheadOfStartingOffsets" to use a
  checkpoint starting from 1 rather than 2. I believe the old test was
  wrong here too; it was expecting the task to start reading from the
  checkpointed offset, but it actually should have started reading from
  one past the checkpointed offset.
- Adjusted "testIncrementalHandOffReadsThroughEndOffsets" to expect
  11 messages read instead of 12. It's starting at message 0 and reading
  up to 10, which should be 11 messages.

* Changes from code review.
2019-03-15 00:22:42 -07:00
Furkan KAMACI 7ada1c49f9 Prohibit Throwables.propagate() (#7121)
* Throw caught exception.

* Throw caught exceptions.

* Related checkstyle rule is added to prevent further bugs.

* RuntimeException() is used instead of Throwables.propagate().

* Missing import is added.

* Throwables are propogated if possible.

* Throwables are propogated if possible.

* Throwables are propogated if possible.

* Throwables are propogated if possible.

* * Checkstyle definition is improved.
* Throwables.propagate() usages are removed.

* Checkstyle pattern is changed for only scanning "Throwables.propagate(" instead of checking lookbehind.

* Throwable is kept before firing a Runtime Exception.

* Fix unused assignments.
2019-03-14 18:28:33 -03:00
Jonathan Wei c020272add Fix KafkaRecordSupplier assign (#7260)
* Fix KafkaRecordSupplier assign

* TeamCity fix
2019-03-14 14:36:14 +08:00
Clint Wylie fb1489d313 fix SequenceMetadata deserialization (#7256)
* wip

* fix tests, stop reading if we are at end offset

* fix build

* remove restore at end offsets fix in favor of a separate PR

* use typereference from method for serialization too
2019-03-13 17:29:39 -04:00
Jihoon Son 32e86ea75e Fix record validation in SeekableStreamIndexTaskRunner (#7246)
* Fix record validation in SeekableStreamIndexTaskRunner

* add kinesis test
2019-03-12 21:12:21 -07:00
Gian Merlino fa218f5160 Fix two SeekableStream serde issues. (#7176)
* Fix two SeekableStream serde issues.

1) Fix backwards-compatibility serde for SeekableStreamPartitions. It is needed
   for split 0.13 / 0.14 clusters to work properly during a rolling update.
2) Abstract classes don't need JsonCreator constructors; remove them.

* Comment fixes.
2019-03-01 22:27:08 -08:00
Surekha 80a2ef7be4 Support kafka transactional topics (#5404) (#6496)
* Support kafka transactional topics

* update kafka to version 2.0.0
* Remove the skipOffsetGaps option since it's not used anymore
* Adjust kafka consumer to use transactional semantics
* Update tests

* Remove unused import from test

* Fix compilation

* Invoke transaction api to fix a unit test

* temporary modification of travis.yml for debugging

* another attempt to get travis tasklogs

* update kafka to 2.0.1 at all places

* Remove druid-kafka-eight dependency from integration-tests, remove the kafka firehose test and deprecate kafka-eight classes

* Add deprecated in docs for kafka-eight and kafka-simple extensions

* Remove skipOffsetGaps and code changes for transaction support

* Fix indentation

* remove skipOffsetGaps from kinesis

* Add transaction api to KafkaRecordSupplierTest

* Fix indent

* Fix test

* update kafka version to 2.1.0
2019-02-18 11:50:08 -08:00
Roman Leventov 0e926e8652 Prohibit assigning concurrent maps into Map-typed variables and fields and fix a race condition in CoordinatorRuleManager (#6898)
* Prohibit assigning concurrent maps into Map-types variables and fields; Fix a race condition in CoordinatorRuleManager; improve logic in DirectDruidClient and ResourcePool

* Enforce that if compute(), computeIfAbsent(), computeIfPresent() or merge() is called on a ConcurrentHashMap, it's stored in a ConcurrentHashMap-typed variable, not ConcurrentMap; add comments explaining get()-before-computeIfAbsent() optimization; refactor Counters; fix a race condition in Intialization.java

* Remove unnecessary comment

* Checkstyle

* Fix getFromExtensions()

* Add a reference to the comment about guarded computeIfAbsent() optimization; IdentityHashMap optimization

* Fix UriCacheGeneratorTest

* Workaround issue with MaterializedViewQueryQueryToolChest

* Strengthen Appenderator's contract regarding concurrency
2019-02-04 09:18:12 -08:00
Roman Leventov 8eae26fd4e Introduce SegmentId class (#6370)
* Introduce SegmentId class

* tmp

* Fix SelectQueryRunnerTest

* Fix indentation

* Fixes

* Remove Comparators.inverse() tests

* Refinements

* Fix tests

* Fix more tests

* Remove duplicate DataSegmentTest, fixes #6064

* SegmentDescriptor doc

* Fix SQLMetadataStorageUpdaterJobHandler

* Fix DataSegment deserialization for ignoring id

* Add comments

* More comments

* Address more comments

* Fix compilation

* Restore segment2 in SystemSchemaTest according to a comment

* Fix style

* fix testServerSegmentsTable

* Fix compilation

* Add comments about why SegmentId and SegmentIdWithShardSpec are separate classes

* Fix SystemSchemaTest

* Fix style

* Compare SegmentDescriptor with SegmentId in Javadoc and comments rather than with DataSegment

* Remove a link, see https://youtrack.jetbrains.com/issue/IDEA-205164

* Fix compilation
2019-01-21 11:11:10 -08:00
Mingming Qiu 6761663509 make kafka poll timeout can be configured (#6773)
* make kafka poll timeout can be configured

* add doc

* rename DEFAULT_POLL_TIMEOUT to DEFAULT_POLL_TIMEOUT_MILLIS
2019-01-03 12:16:02 +08:00
Joshua Sun 7c7997e8a1 Add Kinesis Indexing Service to core Druid (#6431)
* created seekablestream classes

* created seekablestreamsupervisor class

* first attempt to integrate kafa indexing service to use SeekableStream

* seekablestream bug fixes

* kafkarecordsupplier

* integrated kafka indexing service with seekablestream

* implemented resume/suspend and refactored some package names

* moved kinesis indexing service into core druid extensions

* merged some changes from kafka supervisor race condition

* integrated kinesis-indexing-service with seekablestream

* unite tests for kinesis-indexing-service

* various bug fixes for kinesis-indexing-service

* refactored kinesisindexingtask

* finished up more kinesis unit tests

* more bug fixes for kinesis-indexing-service

* finsihed refactoring kinesis unit tests

* removed KinesisParititons and KafkaPartitions to use SeekableStreamPartitions

* kinesis-indexing-service code cleanup and docs

* merge #6291

merge #6337

merge #6383

* added more docs and reordered methods

* fixd kinesis tests after merging master and added docs in seekablestream

* fix various things from pr comment

* improve recordsupplier and add unit tests

* migrated to aws-java-sdk-kinesis

* merge changes from master

* fix pom files and forbiddenapi checks

* checkpoint JavaType bug fix

* fix pom and stuff

* disable checkpointing in kinesis

* fix kinesis sequence number null in closed shard

* merge changes from master

* fixes for kinesis tasks

* capitalized <partitionType, sequenceType>

* removed abstract class loggers

* conform to guava api restrictions

* add docker for travis other modules test

* address comments

* improve RecordSupplier to supply records in batch

* fix strict compile issue

* add test scope for localstack dependency

* kinesis indexing task refactoring

* comments

* github comments

* minor fix

* removed unneeded readme

* fix deserialization bug

* fix various bugs

* KinesisRecordSupplier unable to catch up to earliest position in stream bug fix

* minor changes to kinesis

* implement deaggregate for kinesis

* Merge remote-tracking branch 'upstream/master' into seekablestream

* fix kinesis offset discrepancy with kafka

* kinesis record supplier disable getPosition

* pr comments

* mock for kinesis tests and remove docker dependency for unit tests

* PR comments

* avg lag in kafkasupervisor #6587

* refacotred SequenceMetadata in taskRunners

* small fix

* more small fix

* recordsupplier resource leak

* revert .travis.yml formatting

* fix style

* kinesis docs

* doc part2

* more docs

* comments

* comments*2

* revert string replace changes

* comments

* teamcity

* comments part 1

* comments part 2

* comments part 3

* merge #6754

* fix injection binding

* comments

* KinesisRegion refactor

* comments part idk lol

* can't think of a commit msg anymore

* remove possiblyResetDataSourceMetadata() for IncrementalPublishingTaskRunner

* commmmmmmmmmments

* extra error handling in KinesisRecordSupplier getRecords

* comments

* quickfix

* typo

* oof
2018-12-21 12:49:24 -07:00
Jihoon Son 4591c56afb Fix error handling after pause request in Kafka supervisor (#6754)
* Fix error handling after pause request in kafka supervisor

* fix test

* fix test
2018-12-18 17:52:44 -08:00
Roman Leventov ec38df7575
Simplify DruidNodeDiscoveryProvider; add DruidNodeDiscovery.Listener.nodeViewInitialized() (#6606)
* Simplify DruidNodeDiscoveryProvider; add DruidNodeDiscovery.Listener.nodeViewInitialized() method; prohibit and eliminate some suboptimal Java 8 patterns

* Fix style

* Fix HttpEmitterTest.timeoutEmptyQueue()

* Add DruidNodeDiscovery.Listener.nodeViewInitialized() calls in tests

* Clarify code
2018-12-01 01:12:56 +01:00
Mingming Qiu c5405bb592 emit maxLag/avgLag in KafkaSupervisor (#6587)
* emit maxLag/totalLag/avgLag in KafkaSupervisor

* modify ingest/kafka/totalLag to ingest/kafka/lag for backwards compatibility
2018-11-28 02:11:14 -08:00
Roman Leventov 87b96fb1fd
Add checkstyle rules about imports and empty lines between members (#6543)
* Add checkstyle rules about imports and empty lines between members

* Add suppressions

* Update Eclipse import order

* Add empty line

* Fix StatsDEmitter
2018-11-20 12:42:15 +01:00
Joshua Sun 7928e81e3f kafkasupervisor checkpointing bug (#6639) 2018-11-16 16:09:55 -08:00
hate13 c4cb4b4909 optimize loading end offsets of all partitions (#6623) 2018-11-16 15:24:01 +08:00
Jihoon Son d738ce4d2a Enforce logging when killing a task (#6621)
* Enforce logging when killing a task

* fix test

* address comment

* address comment
2018-11-16 10:01:56 +08:00
David Lim afb239b17a add missing license headers, in particular to MD files; clean up RAT … (#6563)
* add missing license headers, in particular to MD files; clean up RAT exclusions

* revert inadvertent doc changes

* docs

* cr changes

* fix modified druid-production.svg
2018-11-13 09:38:37 -08:00
Clint Wylie e326086604 fix kafka indexing task not processing through end offsets on publish, fixes #6602 (#6603) 2018-11-12 14:27:32 -08:00
Roman Leventov 54351a5c75 Fix various bugs; Enable more IntelliJ inspections and update error-prone (#6490)
* Fix various bugs; Enable more IntelliJ inspections and update error-prone

* Fix NPE

* Fix inspections

* Remove unused imports
2018-11-06 14:38:08 -08:00
QiuMM 676f5e6d7f Prohibit some guava collection APIs and use JDK collection APIs directly (#6511)
* Prohibit some guava collection APIs and use JDK APIs directly

* reset files that changed by accident

* sort codestyle/druid-forbidden-apis.txt alphabetically
2018-10-29 13:02:43 +01:00
Joshua Sun f7753ef1e2 fix KafkaSupervisor stats report error (#6508)
* fix kafkasupervisor stats 500

* added unit test

* throw error if group already exists
2018-10-25 15:45:54 -07:00
Roman Leventov 84ac18dc1b
Catch some incorrect method parameter or call argument formatting patterns with checkstyle (#6461)
* Catch some incorrect method parameter or call argument formatting patterns with checkstyle

* Fix DiscoveryModule

* Inline parameters_and_arguments.txt

* Fix a bug in PolyBind

* Fix formatting
2018-10-23 07:17:38 -03:00
elloooooo 1e82b6291e Remove consumer.listTopics() method in case when too many topics in kafka causes the FullGC in Overlord (#6455)
* remove consumer.listTopics() method

* add consumerLock and exception handling for consumer.partitionFor() and remove some useless checks

* add check in case consumer.partitionsFor() returns null

* fix CI failure

* fix failed UT

* Revert "fix CI failure"

This reverts commit f839d09e1e.

* revert unless commit and re-commit the useful part to fix failed UT
2018-10-22 10:46:31 -07:00
Roman Leventov aa121da25f Use NodeType enum instead of Strings (#6377)
* Use NodeType enum instead of Strings

* Make NodeType constants uppercase

* Fix CommonCacheNotifier and NodeType/ServerType comments

* Reconsidering comment

* Fix import

* Add a comment to CommonCacheNotifier.NODE_TYPES
2018-10-14 20:49:38 -07:00
Roman Leventov e3397ba00f Enforce Druid's exception class use (#6456) 2018-10-13 16:35:14 -07:00
Atul Mohan ab7b4798cc Securing passwords used for SSL connections to Kafka (#6285)
* Secure credentials in consumer properties

* Merge master

* Refactor property population into separate method

* Fix property setter

* Fix tests
2018-10-11 10:03:01 -07:00
Jonathan Wei c7ac8785a1 Prevent failed KafkaConsumer creation from blocking overlord startup (#6383)
* Prevent failed KafkaConsumer creation from blocking overlord startup

* PR comments

* Fix random task ID length

* Adjust test timer

* Use Integer.SIZE
2018-10-03 19:08:20 -07:00
Roman Leventov 3ae563263a
Renamed 'Generic Column' -> 'Numeric Column'; Fixed a few resource leaks in processing; misc refinements (#5957)
This PR accumulates many refactorings and small improvements that I did while preparing the next change set of https://github.com/druid-io/druid/projects/2. I finally decided to make them a separate PR to minimize the volume of the main PR.

Some of the changes:
 - Renamed confusing "Generic Column" term to "Numeric Column" (what it actually implies) in many class names.
 - Generified `ComplexMetricExtractor`
2018-10-02 14:50:22 -03:00
QiuMM 993bc5e9d3 Fix Kafka Indexing Service notice handle thread may never terminate (#6337)
* Fix Kafka Indexing Service notice handle thread may never terminate

* address comment

* handle null value
2018-09-26 20:09:53 -07:00
QiuMM 00ea8c00ac using Entry directly instead of Map.Entry in KafkaSupervisor (#6291) 2018-09-26 19:01:36 -07:00
QiuMM 255214cbe6 correct variable name in KafkaSupervisor (#6354) 2018-09-20 16:22:03 -07:00
Joshua Sun 4fafc2ccc9 fixes race condition in kafkasupervisor (#6304)
* fixes race condition in kafkasupervisor

* async verify checkpoints

* fixes race condition in kafkasupervisor

* replace commonly used methods with variables

* remove countdownlatch import

* reformat

* fixes
2018-09-18 22:37:22 -07:00
Roman Leventov 0c4bd2b57b Prohibit some Random usage patterns (#6226)
* Prohibit Random usage patterns

* Fix FlattenJSONBenchmarkUtil
2018-09-14 13:35:51 -07:00
Roman Leventov d50b69e6d4 Prohibit LinkedList (#6112)
* Prohibit LinkedList

* Fix tests

* Fix

* Remove unused import
2018-09-13 18:07:06 -07:00
Clint Wylie 91a37c692d 'suspend' and 'resume' support for supervisors (kafka indexing service, materialized views) (#6234)
* 'suspend' and 'resume' support for kafka indexing service
changes:
* introduces `SuspendableSupervisorSpec` interface to describe supervisors which support suspend/resume functionality controlled through the `SupervisorManager`, which will gracefully shutdown the supervisor and it's tasks, update it's `SupervisorSpec` with either a suspended or running state, and update with the toggled spec. Spec updates are provided by `SuspendableSupervisorSpec.createSuspendedSpec` and `SuspendableSupervisorSpec.createRunningSpec` respectively.
* `KafkaSupervisorSpec` extends `SuspendableSupervisorSpec` and now supports suspend/resume functionality. The difference in behavior between 'running' and 'suspended' state is whether the supervisor will attempt to ensure that indexing tasks are or are not running respectively. Behavior is identical otherwise.
* `SupervisorResource` now provides `/druid/indexer/v1/supervisor/{id}/suspend` and `/druid/indexer/v1/supervisor/{id}/resume` which are used to suspend/resume suspendable supervisors
* Deprecated `/druid/indexer/v1/supervisor/{id}/shutdown` and moved it's functionality to `/druid/indexer/v1/supervisor/{id}/terminate` since 'shutdown' is ambiguous verbage for something that effectively stops a supervisor forever
* Added ability to get all supervisor specs from `/druid/indexer/v1/supervisor` by supplying the 'full' query parameter `/druid/indexer/v1/supervisor?full` which will return a list of json objects of the form `{"id":<id>, "spec":<SupervisorSpec>}`
* Updated overlord console ui to enable suspend/resume, and changed 'shutdown' to 'terminate'

* move overlord console status to own column in supervisor table so does not look like garbage

* spacing

* padding

* other kind of spacing

* fix rebase fail

* fix more better

* all supervisors now suspendable, updated materialized view supervisor to support suspend, more tests

* fix log
2018-09-13 14:42:18 -07:00
Clint Wylie e6e068ce60 Add support for 'maxTotalRows' to incremental publishing kafka indexing task and appenderator based realtime task (#6129)
* resolves #5898 by adding maxTotalRows to incremental publishing kafka index task and appenderator based realtime indexing task, as available in IndexTask

* address review comments

* changes due to review

* merge fail
2018-09-07 13:17:49 -07:00
Gian Merlino 431d3d8497
Rename io.druid to org.apache.druid. (#6266)
* Rename io.druid to org.apache.druid.

* Fix META-INF files and remove some benchmark results.

* MonitorsConfig update for metrics package migration.

* Reorder some dimensions in inner queries for some reason.

* Fix protobuf tests.
2018-08-30 09:56:26 -07:00
Jonathan Wei c9a27e3e8e
Don't let catch/finally suppress main exception in IncrementalPublishingKafkaIndexTaskRunner (#6258) 2018-08-28 16:12:02 -07:00
Jihoon Son bda5a8a95e Fix NPE in KafkaSupervisor.checkpointTaskGroup (#6206)
* Fix NPE in KafkaSupervisor.checkpointTaskGroup

* address comments

* address comment
2018-08-26 22:23:33 -07:00
Benedict Jin 3647d4c94a Make time-related variables more readable (#6158)
* Make time-related variables more readable

* Patch some improvements from the code reviewer

* Remove unnecessary boxing of Long type variables
2018-08-21 15:29:40 -07:00
Jihoon Son 2bfe1b6a5a Fix NPE for taskGroupId when rolling update (#6168)
* Fix NPE for taskGroupId

* missing changes

* fix wrong annotation

* fix potential race

* keep baseSequenceName

* make deprecated old param
2018-08-17 10:15:45 -07:00
Gian Merlino 5ce3185b9c Fix three bugs with segment publishing. (#6155)
* Fix three bugs with segment publishing.

1. In AppenderatorImpl: always use a unique path if requested, even if the segment
   was already pushed. This is important because if we don't do this, it causes
   the issue mentioned in #6124.
2. In IndexerSQLMetadataStorageCoordinator: Fix a bug that could cause it to return
   a "not published" result instead of throwing an exception, when there was one
   metadata update failure, followed by some random exception. This is done by
   resetting the AtomicBoolean that tracks what case we're in, each time the
   callback runs.
3. In BaseAppenderatorDriver: Only kill segments if we get an affirmative false
   publish result. Skip killing if we just got some exception. The reason for this
   is that we want to avoid killing segments if they are in an unknown state.

Two other changes to clarify the contracts a bit and hopefully prevent future bugs:

1. Return SegmentPublishResult from TransactionalSegmentPublisher, to make it
more similar to announceHistoricalSegments.
2. Make it explicit, at multiple levels of javadocs, that a "false" publish result
must indicate that the publish _definitely_ did not happen. Unknown states must be
exceptions. This helps BaseAppenderatorDriver do the right thing.

* Remove javadoc-only import.

* Updates.

* Fix test.

* Fix tests.
2018-08-15 13:55:53 -07:00
Gian Merlino 3525d4059e
Cache: Add maxEntrySize config, make groupBy cacheable by default. (#5108)
* Cache: Add maxEntrySize config.

The idea is this makes it more feasible to cache query types that
can potentially generate large result sets, like groupBy and select,
without fear of writing too much to the cache per query.

Includes a refactor of cache population code in CachingQueryRunner and
CachingClusteredClient, such that they now use the same CachePopulator
interface with two implementations: one for foreground and one for
background.

The main reason for splitting the foreground / background impls is
that the foreground impl can have a more effective implementation of
maxEntrySize. It can stop retaining subvalues for the cache early.

* Add CachePopulatorStats.

* Fix whitespace.

* Fix docs.

* Fix various tests.

* Add tests.

* Fix tests.

* Better tests

* Remove conflict markers.

* Fix licenses.
2018-08-07 10:23:15 -07:00
Jihoon Son 56ab4363ea
Native parallel batch indexing without shuffle (#5492)
* Native parallel indexing without shuffle

* fix build

* fix ci

* fix ingestion without intervals

* fix retry

* fix retry

* add it test

* use chat handler

* fix build

* add docs

* fix ITUnionQueryTest

* fix failures

* disable metrics reporting

* working

* Fix split of static-s3 firehose

* Add endpoints to supervisor task and a unit test for endpoints

* increase timeout in test

* Added doc

* Address comments

* Fix overlapping locks

* address comments

* Fix static s3 firehose

* Fix test

* fix build

* fix test

* fix typo in docs

* add missing maxBytesInMemory to doc

* address comments

* fix race in test

* fix test

* Rename to ParallelIndexSupervisorTask

* fix teamcity

* address comments

* Fix license

* addressing comments

* addressing comments

* indexTaskClient-based segmentAllocator instead of CountingActionBasedSegmentAllocator

* Fix race in TaskMonitor and move HTTP endpoints to supervisorTask from runner

* Add more javadocs

* use StringUtils.nonStrictFormat for logging

* fix typo and remove unused class

* fix tests

* change package

* fix strict build

* tmp

* Fix overlord api according to the recent change in master

* Fix it test
2018-08-06 23:59:42 -07:00