* KillTask from overlord UI now makes sure that it terminates the underlying MR job, thus saving unnecessary compute
Run in jobby is now split into 2
1. submitAndGetHadoopJobId followed by 2. run
submitAndGetHadoopJobId is responsible for submitting the job and returning the jobId as a string, run monitors this job for completion
JobHelper writes this jobId in the path provided by HadoopIndexTask which in turn is provided by the ForkingTaskRunner
HadoopIndexTask reads this path when kill task is clicked to get hte jobId and fire the kill command via the yarn api. This is taken care in the stopGracefully method which is called in SingleTaskBackgroundRunner. Have enabled `canRestore` method to return `true` for HadoopIndexTask in order for the stopGracefully method to be called
Hadoop*Job files have been changed to incorporate the changes to jobby
* Addressing PR comments
* Addressing PR comments - Fix taskDir
* Addressing PR comments - For changing the contract of Task.stopGracefully()
`SingleTaskBackgroundRunner` calls stopGracefully in stop() and then checks for canRestore condition to return the status of the task
* Addressing PR comments
1. Formatting
2. Removing `submitAndGetHadoopJobId` from `Jobby` and calling writeJobIdToFile in the job itself
* Addressing PR comments
1. POM change. Moving hadoop dependency to indexing-hadoop
* Addressing PR comments
1. stopGracefully now accepts TaskConfig as a param
Handling isRestoreOnRestart in stopGracefully for `AppenderatorDriverRealtimeIndexTask, RealtimeIndexTask, SeekableStreamIndexTask`
Changing tests to make TaskConfig param isRestoreOnRestart to true
* created seekablestream classes
* created seekablestreamsupervisor class
* first attempt to integrate kafa indexing service to use SeekableStream
* seekablestream bug fixes
* kafkarecordsupplier
* integrated kafka indexing service with seekablestream
* implemented resume/suspend and refactored some package names
* moved kinesis indexing service into core druid extensions
* merged some changes from kafka supervisor race condition
* integrated kinesis-indexing-service with seekablestream
* unite tests for kinesis-indexing-service
* various bug fixes for kinesis-indexing-service
* refactored kinesisindexingtask
* finished up more kinesis unit tests
* more bug fixes for kinesis-indexing-service
* finsihed refactoring kinesis unit tests
* removed KinesisParititons and KafkaPartitions to use SeekableStreamPartitions
* kinesis-indexing-service code cleanup and docs
* merge #6291
merge #6337
merge #6383
* added more docs and reordered methods
* fixd kinesis tests after merging master and added docs in seekablestream
* fix various things from pr comment
* improve recordsupplier and add unit tests
* migrated to aws-java-sdk-kinesis
* merge changes from master
* fix pom files and forbiddenapi checks
* checkpoint JavaType bug fix
* fix pom and stuff
* disable checkpointing in kinesis
* fix kinesis sequence number null in closed shard
* merge changes from master
* fixes for kinesis tasks
* capitalized <partitionType, sequenceType>
* removed abstract class loggers
* conform to guava api restrictions
* add docker for travis other modules test
* address comments
* improve RecordSupplier to supply records in batch
* fix strict compile issue
* add test scope for localstack dependency
* kinesis indexing task refactoring
* comments
* github comments
* minor fix
* removed unneeded readme
* fix deserialization bug
* fix various bugs
* KinesisRecordSupplier unable to catch up to earliest position in stream bug fix
* minor changes to kinesis
* implement deaggregate for kinesis
* Merge remote-tracking branch 'upstream/master' into seekablestream
* fix kinesis offset discrepancy with kafka
* kinesis record supplier disable getPosition
* pr comments
* mock for kinesis tests and remove docker dependency for unit tests
* PR comments
* avg lag in kafkasupervisor #6587
* refacotred SequenceMetadata in taskRunners
* small fix
* more small fix
* recordsupplier resource leak
* revert .travis.yml formatting
* fix style
* kinesis docs
* doc part2
* more docs
* comments
* comments*2
* revert string replace changes
* comments
* teamcity
* comments part 1
* comments part 2
* comments part 3
* merge #6754
* fix injection binding
* comments
* KinesisRegion refactor
* comments part idk lol
* can't think of a commit msg anymore
* remove possiblyResetDataSourceMetadata() for IncrementalPublishingTaskRunner
* commmmmmmmmmments
* extra error handling in KinesisRecordSupplier getRecords
* comments
* quickfix
* typo
* oof
* Double-checked locking bug is fixed.
* @Nullable is removed since there is no need to use along with @MonotonicNonNull.
* Static import is removed.
* Lazy initialization is implemented.
* Local variables used instead of volatile ones.
* Local variables used instead of volatile ones.
* Add checkstyle rules about imports and empty lines between members
* Add suppressions
* Update Eclipse import order
* Add empty line
* Fix StatsDEmitter
* move parquet-extensions from contrib to core, adds new hadoop parquet parser that does not convert to avro first and supports flattenSpec and int96 columns, add support for flattenSpec for parquet-avro conversion parser, much test with a bunch of files lifted from spark-sql
* fix avro flattener to support nullable primitives for auto discovery and now only supports primitive arrays instead of all arrays
* remove leftover print
* convert micro timestamp to millis
* checkstyle
* add ignore for .parquet and .parq to rat exclude
* fix legit test failure from avro flattern behavior change
* fix rebase
* add exclusions to pom to cut down on redundant jars
* refactor tests, add support for unwrapping lists for parquet-avro, review comments
* more comment
* fix oops
* tweak parquet-avro list handling
* more docs
* fix style
* grr styles
* Apache-ize POM
* put revision information into MANIFEST.MF for binary release
* remove nightly profile
* fix flaky travis by overriding maven-remote-resources-plugin execution from the parent POM
This PR accumulates many refactorings and small improvements that I did while preparing the next change set of https://github.com/druid-io/druid/projects/2. I finally decided to make them a separate PR to minimize the volume of the main PR.
Some of the changes:
- Renamed confusing "Generic Column" term to "Numeric Column" (what it actually implies) in many class names.
- Generified `ComplexMetricExtractor`
* SQL: Update to Calcite 1.17.0.
Other than keeping things fresh, another motivation is that
this fixes CALCITE-1436 (AggregateNode NPE for aggregators other
than SUM/COUNT), which affects aggregate functions on our system
tables.
Also sets shouldConvertRaggedUnionTypesToVarying = true, a new
type system parameter that prefers VARCHAR over CHAR. This is
better for Druid, because we don't really have support for a
true CHAR type.
* Remove unused import.
* Adding licenses and enable apache-rat-plugi.
Change-Id: I4685a2d9f1e147855dba69329b286f2d5bee3c18
* restore the copywrite of demo_table and add it to the list of allowed ones
Change-Id: I2a9efde6f4b984bc1ac90483e90d98e71f818a14
* revirew comments
Change-Id: I0256c930b7f9a5bb09b44b5e7a149e6ec48cb0ca
* more fixup
Change-Id: I1355e8a2549e76cd44487abec142be79bec59de2
* align
Change-Id: I70bc47ecb577bdf6b91639dd91b6f5642aa6b02f
* allow 1 retry for failing tests idk if this is a good idea, but false failure rate due to flaky tests seems pretty bad lately
* try to fix retry issue with teardown
* Update pom.xml
* Update pom.xml
* Rename io.druid to org.apache.druid.
* Fix META-INF files and remove some benchmark results.
* MonitorsConfig update for metrics package migration.
* Reorder some dimensions in inner queries for some reason.
* Fix protobuf tests.
* SQL: Support more result formats, add columns header.
- Add result formats for line-based JSON and CSV.
- Add X-Druid-Sql-Columns header with a list of all columns that
the response will contain.
- Add more comprehensive documentation on what callers should expect
when making Druid SQL queries.
* Fix some tests.
* Adjust tests.
* Adjust trailer, add types header.
* Fix trailers.
* Update JSONPath Library
Re: #5792
- Add a unit test containing a JSONPath conditional
- Update the JSONPath library and no longer exclude the json-smart dependency.
- I believe the original reason for excluding this has been fixed: https://github.com/json-path/JsonPath/pull/315
* Add test
* Fix test
* implement materialized view
* modify code according to jihoonson's comments
* modify code according to jihoonson's comments - 2
* add documentation about materialized view
* use new HadoopTuningConfig in pr 5583
* add minDataLag and fix optimizer bug
* correct value of DEFAULT_MIN_DATA_LAG_MS
* modify code according to jihoonson's comments - 3
* use the boolean expression instead of if-else
* Use the official aws-sdk instead of jet3t
* fix compile and serde tests
* address comments and fix test
* add http version string
* remove redundant dependencies, fix potential NPE, and fix test
* resolve TODOs
* fix build
* downgrade jackson version to 2.6.7
* fix test
* resolve the last TODO
* support proxy and endpoint configurations
* fix build
* remove debugging log
* downgrade hadoop version to 2.8.3
* fix tests
* remove unused log
* fix it test
* revert KerberosAuthenticator change
* change hadoop-aws scope to provided in hdfs-storage
* address comments
* address comments
* Future-proof some Guava usage
* Use a java-util EmptyIterator instead of Guava's
* Change some of the guava future handling to do manual async
transforms. Guava changes transform into transformAsync by deprecating
transform in ONLY Guava 19. Then its gone in 20
* Use `Collections.emptyIterator()`
* Pretty formatting
* Make listenable future transforms a thing in default druid
* Format fix
* Add forbidden guava apis
* Make the ListenableFutrues.transformAsync have comments
* Undo intellij bad pattern matching in comments
* Futrues --> Futures
* Add empty iterators forbidding
* Fix extra `A`
* Correct method signature
* Address review comments
* Finish Gian review comments
* Proper syntax from https://github.com/policeman-tools/forbidden-apis/wiki/SignaturesSyntax
Druid relies on the page cache of Linux in order to have memory segments.
However when loading segments from deep storage or rebalancing the page
cache can get poisoned by segments that should not be in memory yet.
This can significantly slow down Druid in case rebalancing happens
as data that might not be queried often is suddenly in the page cache.
This PR implements the same logic as is in Apache Cassandra and Apache
Bookkeeper.
Closes#4746
* opentsdb emitter extension
* doc for opentsdb emitter extension
* update opentsdb emitter doc
* add the ms unit to the constant name
* add a configurable event limit
* fix version to 0.13.0-SNAPSHOT
* using a thread to consume metric event
* rename method and parameter