* Broker: Await initialization before finishing startup.
In particular, hold off on announcing the service and starting the
HTTP server until the server view and SQL metadata cache are finished
initializing. This closes a window of time where a Broker could return
partial results shortly after startup.
As part of this, some simplification of server-lifecycle service
announcements. This helps ensure that the two different kinds of
announcements we do (legacy and new-style) stay in sync.
* Remove unused imports.
* Fix NPE in ServerRunnable.
* make logs that are only useful for debugging be at debug level so log volume is much more chill
* info level messages for total merge buffer allocated/free
* more chill compaction logs
* FilteredRequestLogger: Fix start/stop, invalid delegate behavior.
Fixes two bugs:
1) FilteredRequestLogger did not start/stop the delegate.
2) FilteredRequestLogger would ignore an invalid delegate type, and
instead silently substitute the "noop" logger. This was due to a larger
problem with RequestLoggerProvider setup in general; the fix here is
to remove "defaultImpl" from the RequestLoggerProvider interface, and
instead have JsonConfigurator be responsible for creating the
default implementations. It is stricter about things than the old system
was, and is only willing to make a noop logger if it doesn't see any
request logger configs. Otherwise, it'll raise a provision error.
* Remove unneeded annotations.
* autosize processing buffers based on direct memory sizing
* remove oops, more test
* max 1gb autosize buffers, test, start of docs
* fix oops
* revert accidental change
* print buffer size in exception
* change the things
* remove AbstractResourceFilter.isApplicable because it is not, add tests for OverlordResource.doShutdown and OverlordResource.shutdownTasksForDatasource
* cleanup
* Fix missing default config in some calls to coordinator dynamic configs.
The lack of a default config meant that if someone called an API
_without_ a default config before one _with_ a default config, then
the default value would get stuck at null instead of the intended
default value. I noticed this in a cluster where calling /druid/coordinator/v1/config
before a coordinator had fully started up would lead to NPEs during
DruidCoordinatorRuleRunner.
This patch makes the default configs consistent across all calls.
* Remove unnecessary null check.
* Add checkstyle rules about imports and empty lines between members
* Add suppressions
* Update Eclipse import order
* Add empty line
* Fix StatsDEmitter
* Use current coordinator leader instead of cached one (#6551)
Check the response status and throw exception if not OK
* Modify tests
* PR comment
* Add the correct check for status of BytesAccumulatingResponseHandler
* Move the status check into JsonParserIterator so sql query outputs meaningful message on failure
* Fix tests
* Period load/drop/broadcast rules should include the future by default
* address comments
* adjust coordinator console and tweak docs
* address comments
* fix travis-ci
This PR allows to control the fields in `RequestLogEvent`, emitted in `EmittingRequestLogger`. In our case, we want to get rid of the `intervals` fields of the query objects that are a part of `DefaultRequestLogEvent`. They are enormous (thousands of segments) and not useful.
Related to #5522, FYI @a2l007.
* Prohibit some guava collection APIs and use JDK APIs directly
* reset files that changed by accident
* sort codestyle/druid-forbidden-apis.txt alphabetically
* 1. Mysql default transaction isolation is REPEATABLE_READ, treat it as READ_COMMITTED will reduce insert id conflict.
2. Add an index to 'dataSource used end' is work well for the most of scenarios(get recently segments), and it will speed up sync add pending segments in DB.
3. 'select and insert' is not need within transaction.
* Use TaskLockbox.doInCriticalSection instead of synchronized syntax to speed up insert pending segments.
* fix typo for NullPointerException
* Use NodeType enum instead of Strings
* Make NodeType constants uppercase
* Fix CommonCacheNotifier and NodeType/ServerType comments
* Reconsidering comment
* Fix import
* Add a comment to CommonCacheNotifier.NODE_TYPES
* o- Query Response format to be based on http 'accept' header & Query Payload contenty type to be based on 'content-type' header
* o- Query Response format to be based on http 'accept' header & Query Payload contenty type to be based on 'content-type' header
o- if Accept header is absent, it defaults to Content-Type header
* Feature: Query Response format to be based on http 'accept' header & Query Payload content type to be based on 'content-type' PR #4033
Minor change to a comment - restoring to previous wording
* Feature: Query Response format to be based on http 'accept' header & Query Payload content type to be based on 'content-type' PR #4033
o- minor change to check for empty string
* Fix inconsistent segment size(#6448)
* Fix the segment size for published segments
* Changes to get numReplicas
* Make coordinator segments API truly streaming
* Changes to store partial segment data
* Simplify SegmentMetadataHolder
* Store partial the columns from available segments
* Address comments
* Added SystemSchema with following tables (#5989)
* SEGMENTS table provides details on served and published segments
* SERVERS table provides details on data servers
* SERVERSEGMETS table is the JOIN of SEGMENTS and SERVERS
* TASKS table provides details on tasks
* Add documentation for system schema
* Fix static-analysis warnings
* Address PR comments
*Add unit tests
* Fix a test
* Try to fix a test
* Fix a bug around replica count
* rename io.druid to org.apache.druid
* Major change is to make tasks and segment queries streaming
* Made tasks/segments stream to calcite instead of storing it in memory
* Add num_rows to segments table
* Refactor JsonParserIterator
* Replace with closeable iterator
* Fix docs, make num_rows column nullable, some unit test changes
* make num_rows column type long, allow it to be null
fix a compile error after merge, add TrafficCop param to InputStreamResponseHandler
* Filter null rows for segments table from Linq4j enumerable
* change num_replicas datatype to long in segments table
* Fix some tests and address comments
* Doc updates, other PR comments
* Update tests
* Address comments
* Add auth check
* Update docs
* Refactoring
* Fix teamcity warning, change the getQueryableServer in TimelineServerView
* Fix compilation after rebase
* Use the stream API from AuthorizationUtils
* Added LeaderClient interface and NoopDruidLeaderClient class
* Revert "Added LeaderClient interface and NoopDruidLeaderClient class"
This reverts commit 100fa46e39.
* Make the naming consistent to server_segments for the join table
* Add ForbiddenException on auth check failure
* Remove static block from SystemSchema
* Try to fix a test in CalciteQueryTest due to rename of server_segments
* Fix the json output format in the coordinator API
* Add auth check in the segments API
* Add null check to avoid NPE
* Use annonymous class object instead of mock for DruidLeaderClient in SqlBenchmark
* Fix test failures, type long/BIGINT can be nullable
* Revert long nullability to fix tests
* Fix style for tests
* PR comments
* Address PR comments
* Add the missing BytesAccumulatingResponseHandler class
* Use Sequences.withBaggage in DruidPlanner
* Fix docs, add comments
* Close the iterator if hasNext returns false
This PR accumulates many refactorings and small improvements that I did while preparing the next change set of https://github.com/druid-io/druid/projects/2. I finally decided to make them a separate PR to minimize the volume of the main PR.
Some of the changes:
- Renamed confusing "Generic Column" term to "Numeric Column" (what it actually implies) in many class names.
- Generified `ComplexMetricExtractor`
* Added backpressure metric
* Updated channelReadable to AtomicBoolean and fixed broken test
* Moved backpressure metric logic to NettyHttpClient
* Fix placement of calculating backPressureDuration
* Add support targetCompactionSizeBytes for compactionTask
* fix test
* fix a bug in keepSegmentGranularity
* fix wrong noinspection comment
* address comments
The indexes introduced in #6348 were on the wrong table. The tests
did not catch them due to retries on the create table steps (the
first try created the table but not the bogus indexes; the second
try noticed that the table already existed and did nothing). This
patch doesn't fix the issue with the tests, since the best way to
do that would be to do the table and index creation in a
transaction; but, this is not supported by all of our supported
database engines.
* Adding licenses and enable apache-rat-plugi.
Change-Id: I4685a2d9f1e147855dba69329b286f2d5bee3c18
* restore the copywrite of demo_table and add it to the list of allowed ones
Change-Id: I2a9efde6f4b984bc1ac90483e90d98e71f818a14
* revirew comments
Change-Id: I0256c930b7f9a5bb09b44b5e7a149e6ec48cb0ca
* more fixup
Change-Id: I1355e8a2549e76cd44487abec142be79bec59de2
* align
Change-Id: I70bc47ecb577bdf6b91639dd91b6f5642aa6b02f
* 'suspend' and 'resume' support for kafka indexing service
changes:
* introduces `SuspendableSupervisorSpec` interface to describe supervisors which support suspend/resume functionality controlled through the `SupervisorManager`, which will gracefully shutdown the supervisor and it's tasks, update it's `SupervisorSpec` with either a suspended or running state, and update with the toggled spec. Spec updates are provided by `SuspendableSupervisorSpec.createSuspendedSpec` and `SuspendableSupervisorSpec.createRunningSpec` respectively.
* `KafkaSupervisorSpec` extends `SuspendableSupervisorSpec` and now supports suspend/resume functionality. The difference in behavior between 'running' and 'suspended' state is whether the supervisor will attempt to ensure that indexing tasks are or are not running respectively. Behavior is identical otherwise.
* `SupervisorResource` now provides `/druid/indexer/v1/supervisor/{id}/suspend` and `/druid/indexer/v1/supervisor/{id}/resume` which are used to suspend/resume suspendable supervisors
* Deprecated `/druid/indexer/v1/supervisor/{id}/shutdown` and moved it's functionality to `/druid/indexer/v1/supervisor/{id}/terminate` since 'shutdown' is ambiguous verbage for something that effectively stops a supervisor forever
* Added ability to get all supervisor specs from `/druid/indexer/v1/supervisor` by supplying the 'full' query parameter `/druid/indexer/v1/supervisor?full` which will return a list of json objects of the form `{"id":<id>, "spec":<SupervisorSpec>}`
* Updated overlord console ui to enable suspend/resume, and changed 'shutdown' to 'terminate'
* move overlord console status to own column in supervisor table so does not look like garbage
* spacing
* padding
* other kind of spacing
* fix rebase fail
* fix more better
* all supervisors now suspendable, updated materialized view supervisor to support suspend, more tests
* fix log
* Broker backpressure.
Adds a new property "druid.broker.http.maxQueuedBytes" and a new context
parameter "maxQueuedBytes". Both represent a maximum number of bytes queued
per query before exerting backpressure on the channel to the data server.
Fixes#4933.
* Fix query context doc.
* resolves#5898 by adding maxTotalRows to incremental publishing kafka index task and appenderator based realtime indexing task, as available in IndexTask
* address review comments
* changes due to review
* merge fail
* Rename io.druid to org.apache.druid.
* Fix META-INF files and remove some benchmark results.
* MonitorsConfig update for metrics package migration.
* Reorder some dimensions in inner queries for some reason.
* Fix protobuf tests.
* Fix all inspection errors currently reported.
TeamCity builds on master are reporting inspection errors, possibly
because there was a while where it was not running due to the Apache
migration, and there was some drift.
* Fix one more location.
* Fix tests.
* Another fix.
* Fix three bugs with segment publishing.
1. In AppenderatorImpl: always use a unique path if requested, even if the segment
was already pushed. This is important because if we don't do this, it causes
the issue mentioned in #6124.
2. In IndexerSQLMetadataStorageCoordinator: Fix a bug that could cause it to return
a "not published" result instead of throwing an exception, when there was one
metadata update failure, followed by some random exception. This is done by
resetting the AtomicBoolean that tracks what case we're in, each time the
callback runs.
3. In BaseAppenderatorDriver: Only kill segments if we get an affirmative false
publish result. Skip killing if we just got some exception. The reason for this
is that we want to avoid killing segments if they are in an unknown state.
Two other changes to clarify the contracts a bit and hopefully prevent future bugs:
1. Return SegmentPublishResult from TransactionalSegmentPublisher, to make it
more similar to announceHistoricalSegments.
2. Make it explicit, at multiple levels of javadocs, that a "false" publish result
must indicate that the publish _definitely_ did not happen. Unknown states must be
exceptions. This helps BaseAppenderatorDriver do the right thing.
* Remove javadoc-only import.
* Updates.
* Fix test.
* Fix tests.
* Cache: Add maxEntrySize config.
The idea is this makes it more feasible to cache query types that
can potentially generate large result sets, like groupBy and select,
without fear of writing too much to the cache per query.
Includes a refactor of cache population code in CachingQueryRunner and
CachingClusteredClient, such that they now use the same CachePopulator
interface with two implementations: one for foreground and one for
background.
The main reason for splitting the foreground / background impls is
that the foreground impl can have a more effective implementation of
maxEntrySize. It can stop retaining subvalues for the cache early.
* Add CachePopulatorStats.
* Fix whitespace.
* Fix docs.
* Fix various tests.
* Add tests.
* Fix tests.
* Better tests
* Remove conflict markers.
* Fix licenses.
* Native parallel indexing without shuffle
* fix build
* fix ci
* fix ingestion without intervals
* fix retry
* fix retry
* add it test
* use chat handler
* fix build
* add docs
* fix ITUnionQueryTest
* fix failures
* disable metrics reporting
* working
* Fix split of static-s3 firehose
* Add endpoints to supervisor task and a unit test for endpoints
* increase timeout in test
* Added doc
* Address comments
* Fix overlapping locks
* address comments
* Fix static s3 firehose
* Fix test
* fix build
* fix test
* fix typo in docs
* add missing maxBytesInMemory to doc
* address comments
* fix race in test
* fix test
* Rename to ParallelIndexSupervisorTask
* fix teamcity
* address comments
* Fix license
* addressing comments
* addressing comments
* indexTaskClient-based segmentAllocator instead of CountingActionBasedSegmentAllocator
* Fix race in TaskMonitor and move HTTP endpoints to supervisorTask from runner
* Add more javadocs
* use StringUtils.nonStrictFormat for logging
* fix typo and remove unused class
* fix tests
* change package
* fix strict build
* tmp
* Fix overlord api according to the recent change in master
* Fix it test
* Optimize per-segment queries
* Always optimize, add unit test
* PR comments
* Only run IntervalDimFilter optimization on __time column
* PR comments
* Checkstyle fix
* Add test for non __time column
* Remove some unnecessary task storage internal APIs.
- Remove MetadataStorageActionHandler's getInactiveStatusesSince and getActiveEntriesWithStatus.
- Remove TaskStorage's getCreatedDateTimeAndDataSource.
- Remove TaskStorageQueryAdapter's getCreatedTime, and getCreatedDateAndDataSource.
- Migrated all callers to getActiveTaskInfo and getCompletedTaskInfo.
This has one side effect: since getActiveTaskInfo (new) warns and continues when it
sees unreadable tasks, but getActiveEntriesWithStatus threw an exception when it
encountered those, it means that after this patch bad tasks will be ignored when
syncing from metadata storage rather than causing an exception to be thrown.
IMO, this is an improvement, since the most likely reason for bad tasks is either:
- A new version introduced an additional validation, and a pre-existing task doesn't
pass it.
- You are rolling back from a newer version to an older version.
In both cases, I believe you would want to skip tasks that can't be deserialized,
rather than blocking overlord startup.
* Remove unused import.
* Fix formatting.
* Fix formatting.
* Various changes about druid-services module
* Patch improvements from reviewer
* Add ToArrayCallWithZeroLengthArrayArgument & ArraysAsListWithZeroOrOneArgument into inspection profile
* Fix ArraysAsListWithZeroOrOneArgument
* Fix conflict
* Fix ToArrayCallWithZeroLengthArrayArgument
* Fix AliEqualsAvoidNull
* Remove blank line
* Remove unused import clauses
* Fix code style in TopNQueryRunnerTest
* Fix conflict
* Don't use Collections.singletonList when converting the type of array type
* Add argLine into maven-surefire-plugin in druid-process module & increase the timeout value for testMoveSegment testcase
* Roll back the latest commit
* Add java.io.File#toURL() into druid-forbidden-apis
* Using Boolean.parseBoolean instead of Boolean.valueOf for CliCoordinator#isOverlord
* Add a new regexp element into stylecode xml file
* Fix style error for new regexp
* Set the level of ArraysAsListWithZeroOrOneArgument as WARNING
* Fix style error for new regexp
* Add option BY_LEVEL for ToArrayCallWithZeroLengthArrayArgument in inspection profile
* Roll back the level as ToArrayCallWithZeroLengthArrayArgument as ERROR
* Add toArray(new Object[0]) regexp into checkstyle config file & fix them
* Set the level of ArraysAsListWithZeroOrOneArgument as ERROR & Roll back the level of ToArrayCallWithZeroLengthArrayArgument as WARNING until Youtrack fix it
* Add a comment for string equals regexp in checkstyle config
* Fix code format
* Add RedundantTypeArguments as ERROR level inspection
* Fix cannot resolve symbol datasource
* Add support to filter on datasource for active tasks
* Added datasource filter to sql query for active tasks
* Fixed unit tests
* Address PR comments
* Fix NPE while handling CheckpointNotice
* fix code style
* Fix test
* fix test
* add a log for creating a new taskGroup
* fix backward compatibility in KafkaIOConfig
* this will fix it
* filter destinations to not consider servers already serving segment
* fix it
* cleanup
* fix opposite day in ImmutableDruidServer.equals
* simplify
False failures on Travis due to spurious timeout (in turn due to noisy
neighbors) is a bigger problem than legitimate failures taking too long
to time out. So it makes sense to extend timeouts.
* Fix SQL Server select query in createInactiveStatusesSinceQuery() method.
SQL server does not support LIMIT N in select queries. Instead it has TOP N to limiting number of query results.
And TOP N is already added in the select statement as per maxNumStatuses value.
* Add parentheses for TOP in SELECT statement as SQL Servers no longer support TOP without parentheses.
* Add the new tasks api in overlordResource
It takes 4 optional query params
* state(pending/running/waiting/compelte)
* dataSource
* interval (applies to completed tasks)
* maxCompletedTasks (applies to completed tasks)
If all params are null, the api returns all the tasks
* Add the state to each task returned by tasks endpoint
* divide active tasks into waiting, pending or running
* Add more unit tests
* Add UNKNOWN state to TaskState
* Fix the authorization calls
* WIP: PR comments
Added new class to capture task info for caching
Other refactoring
* Refactoring : move TaskStatus class to druid-api
so it can be accessed within server
And other related classes like TaskState and TaskStatusPlus are in api
* Remove unused class and apis accessing it
* Add a separate cache for recently completed tasks
This is to mainly capture the task type from payload
* Ignore a test
* Add a RuntimeTaskState to encompass all states a task can be in
* Revert "Add a RuntimeTaskState to encompass all states a task can be in"
This reverts commit 2a527a0731.
* Fix wrong api call
* Fix and unignore tests
* Remove waiting,pending state from TaskState
* Add RunnerTaskState
* Missed the annotation runnerStatusCode
* Fix the creationTime
* Fix the createdTime and queueInsertionTime for running/active tasks
* Clean up tests
* Add javadocs
* Potentially fix the teamcity build
* Address PR comments
*Get rid of TaskInfoBuilder
*Make TaskInfoMapper static nested class
*Other changes
* fix import in MaterializedViewSupervisor after merge
* Address PR comments on
* Replace global cache with local map
* combine multiple queries into one
* Removed unused code
* Fix unit tests
Fix a bug in securedTaskStatusPlus
* Remove getRecentlyFinishedTaskStatuses method
Change TaskInfoMapper signature to add generic type
* Address PR comments
* Passed datasource as argument to be used in sql query
* Other minor fixes
* Address PR comments
*Some minor changes, rename method, spacing changes
* Add early auth check if datasource is not null
* Fix test case
* Add max limit to getRecentlyFinishedTaskInfo in HeapMemoryTaskStorage
* Add TaskLocation to Anytask object
* Address PR comments
* Fix a bug in test case causing ClassCastException
* implement materialized view
* modify code according to jihoonson's comments
* modify code according to jihoonson's comments - 2
* add documentation about materialized view
* use new HadoopTuningConfig in pr 5583
* add minDataLag and fix optimizer bug
* correct value of DEFAULT_MIN_DATA_LAG_MS
* modify code according to jihoonson's comments - 3
* use the boolean expression instead of if-else
* Fix defaultQueryTimeout
- set default timeout in query context before query fail time is evaluated
Remove unused import
* Address failing checks
* Addressing code review comments
* Removed line that was no longer used
* Anonymous authenticator that authenticates all requests and then directs them to an authorizer.
* Adding documentation
* Removed some fields from class AnonymousAuthenticator
* Updating docs
* fix freeSpacePercent in segmentCache.locations
* the check should probably test the other way around
* documentation should put the option in the right place
* examples have a superfluous backslash
* add test to verify correct behavior
* switch to Path and test with jimfs
Path allows to use different filesystems.
Jimfs provides an actual (in memory) filesystem.
This also allows more complex test scenarios.
The behavior should be unchanged by this commit.
* Revert "switch to Path and test with jimfs"
This reverts commit 8b9a418d65.
* VersionedIntervalTimeline: Optimize construction with heavily populated holders.
Each time a segment is "add"ed to a timeline, "isComplete" is called on the holder
that it is added to. "isComplete" is an O(segments per chunk) operation, meaning
that adding N segments to a chunk is an O(N^2) operation. This blows up badly if
we have thousands of segments per chunk.
The patch defers the "isComplete" check until after all segments have been
inserted.
* Fix imports.
* The check for maxBytesInMemory should be >= 0 instead of > 0
* if the default value is 0, the actual check could be skipped
* fix the message for persistReasons
* Address PR comments
* if maxBytes set -1, make is Long.MAX_VAL, so we do not need to check if it's 0 or -1
* set the maxBytesTuningconfig in AppenderatorImpl constructor to avoid duplicate code
* fix the failing test cases
* Address PR comments
* Adding decoration method to proxy servlet
Change-Id: I872f9282fb60bfa20524271535980a36a87b9621
* moving the proxy request decoration to authenticators
Change-Id: I7f94b9ff5ecf08e8abf7169b58bc410f33148448
* added docs
Change-Id: I901543e52f0faf4666bfea6256a7c05593b1ae70
* use the authentication result to decorate request
Change-Id: I052650de9cd02b4faefdbcdaf2332dd3b2966af5
* adding authenticated by name
Change-Id: I074d2933460165feeddb19352eac9bd0f96f42ca
* ensure that authenticator is not null
Change-Id: Idb58e308f90db88224a06f3759114872165b24f5
* fix types and minor bug
Change-Id: I6801d49a05d5d8324406fc0280286954eb66db10
* fix typo
Change-Id: I390b12af74f44d760d0812a519125fbf0df4e97b
* use actual type names
Change-Id: I62c3ee763363781e52809ec912aafd50b8486b8e
* set authenitcatedBy to null for AutheticationResults created by
Escalator.
Change-Id: I4a675c372f59ebd8a8d19c61b85a1e4bf227a8ba
* add default caffeine cache size based on runtime Xmx or max 1GB
* update docs for caffeine cache
* fix formatting
* test caffeine size should never be less than 0
* set caffeine max default size to 1G not 1M
* fix caffeine cache tests
* This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks
Currently a config called 'maxRowsInMemory' is present which affects how much memory gets
used for indexing.If this value is not optimal for your JVM heap size, it could lead
to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might
be bad for query performance and a higher value will limit number of persists but require
more jvm heap space and could lead to OOM.
'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes
kept in memory before persisting.
* The default value is 1/3(Runtime.maxMemory())
* To maintain the current behaviour set 'maxBytesInMemory' to -1
* If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them
will be respected i.e. the first one to go above threshold will trigger persist
* Fix check style and remove a comment
* Add overlord unsecured paths to coordinator when using combined service (#5579)
* Add overlord unsecured paths to coordinator when using combined service
* PR comment
* More error reporting and stats for ingestion tasks (#5418)
* Add more indexing task status and error reporting
* PR comments, add support in AppenderatorDriverRealtimeIndexTask
* Use TaskReport instead of metrics/context
* Fix tests
* Use TaskReport uploads
* Refactor fire department metrics retrieval
* Refactor input row serde in hadoop task
* Refactor hadoop task loader names
* Truncate error message in TaskStatus, add errorMsg to task report
* PR comments
* Allow getDomain to return disjointed intervals (#5570)
* Allow getDomain to return disjointed intervals
* Indentation issues
* Adding feature thetaSketchConstant to do some set operation in PostAgg (#5551)
* Adding feature thetaSketchConstant to do some set operation in PostAggregator
* Updated review comments for PR #5551 - Adding thetaSketchConstant
* Fixed CI build issue
* Updated review comments 2 for PR #5551 - Adding thetaSketchConstant
* Fix taskDuration docs for KafkaIndexingService (#5572)
* With incremental handoff the changed line is no longer true.
* Add doc for automatic pendingSegments (#5565)
* Add missing doc for automatic pendingSegments
* address comments
* Fix indexTask to respect forceExtendableShardSpecs (#5509)
* Fix indexTask to respect forceExtendableShardSpecs
* add comments
* Deprecate spark2 profile in pom.xml (#5581)
Deprecated due to https://github.com/druid-io/druid/pull/5382
* CompressionUtils: Add support for decompressing xz, bz2, zip. (#5586)
Also switch various firehoses to the new method.
Fixes#5585.
* This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks
Currently a config called 'maxRowsInMemory' is present which affects how much memory gets
used for indexing.If this value is not optimal for your JVM heap size, it could lead
to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might
be bad for query performance and a higher value will limit number of persists but require
more jvm heap space and could lead to OOM.
'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes
kept in memory before persisting.
* The default value is 1/3(Runtime.maxMemory())
* To maintain the current behaviour set 'maxBytesInMemory' to -1
* If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them
will be respected i.e. the first one to go above threshold will trigger persist
* Address code review comments
* Fix the coding style according to druid conventions
* Add more javadocs
* Rename some variables/methods
* Other minor issues
* Address more code review comments
* Some refactoring to put defaults in IndexTaskUtils
* Added check for maxBytesInMemory in AppenderatorImpl
* Decrement bytes in abandonSegment
* Test unit test for multiple sinks in single appenderator
* Fix some merge conflicts after rebase
* Fix some style checks
* Merge conflicts
* Fix failing tests
Add back check for 0 maxBytesInMemory in OnHeapIncrementalIndex
* Address PR comments
* Put defaults for maxRows and maxBytes in TuningConfig
* Change/add javadocs
* Refactoring and renaming some variables/methods
* Fix TeamCity inspection warnings
* Added maxBytesInMemory config to HadoopTuningConfig
* Updated the docs and examples
* Added maxBytesInMemory config in docs
* Removed references to maxRowsInMemory under tuningConfig in examples
* Set maxBytesInMemory to 0 until used
Set the maxBytesInMemory to 0 if user does not set it as part of tuningConfing
and set to part of max jvm memory when ingestion task starts
* Update toString in KafkaSupervisorTuningConfig
* Use correct maxBytesInMemory value in AppenderatorImpl
* Update DEFAULT_MAX_BYTES_IN_MEMORY to 1/6 max jvm memory
Experimenting with various defaults, 1/3 jvm memory causes OOM
* Update docs to correct maxBytesInMemory default value
* Minor to rename and add comment
* Add more details in docs
* Address new PR comments
* Address PR comments
* Fix spelling typo
The "lock" object was used to synchronize start/stop as well as synchronize removals
from segmentsToDelete (when a segment is done dropping). This could cause a deadlock
if a segment-load throws an exception during loadLocalCache. loadLocalCache is run
by start() while it holds the lock, but then it spawns loading threads, and those
threads will try to acquire the "segmentsToDelete" lock if they want to drop a corrupt
segments.
I don't see any reason for these two locks to be the same lock, so I split them.
* Fix coordinator's dataSource api with full parameter
* address comment
* Add a constructor for json serde and fix result order
* Change to immutableSortedMap
* Revert immutableSortedMap to treeMap
* Add getters for AlertEvent
* Move PublicApi and ExtensionPoint to java-util
* Fix publicapi annotation usage
* Add publicapi annotations to ServiceMetricEvent and RequestLogEvent
* Add config to allow setting up custom unsecured paths for druid nodes.
* return all resources for Unsecured paths
* review comment - Add test
* fix tests
* fix test
* Add more indexing task status and error reporting
* PR comments, add support in AppenderatorDriverRealtimeIndexTask
* Use TaskReport instead of metrics/context
* Fix tests
* Use TaskReport uploads
* Refactor fire department metrics retrieval
* Refactor input row serde in hadoop task
* Refactor hadoop task loader names
* Truncate error message in TaskStatus, add errorMsg to task report
* PR comments
* drop selection through cost balancer
* use collections.emptyIterator
* add test to ensure does not drop from server with larger loading queue with cost balancer
* javadocs and comments to clear things up
* random drop for completeness
* fix issue where assign primary assigns segments to all historical servers in cluster
* fix test
* add test to ensure primary assignment will not assign to another server while loading is in progress
* Add graceful shutdown timeout
* Handle interruptedException
* Incorporate code review comments
* Address code review comments
* Poll for activeConnections to be zero
* Use statistics handler to get active requests
* Use native jetty shutdown gracefully
* Move log line back to where it was
* Add unannounce wait time
* Make the default retain prior behavior
* Update docs with new config defaults
* Make duration handling on jetty shutdown more consistent
* StatisticsHandler is a wrapper
* Move jetty lifecycle error logging to error
* Use the official aws-sdk instead of jet3t
* fix compile and serde tests
* address comments and fix test
* add http version string
* remove redundant dependencies, fix potential NPE, and fix test
* resolve TODOs
* fix build
* downgrade jackson version to 2.6.7
* fix test
* resolve the last TODO
* support proxy and endpoint configurations
* fix build
* remove debugging log
* downgrade hadoop version to 2.8.3
* fix tests
* remove unused log
* fix it test
* revert KerberosAuthenticator change
* change hadoop-aws scope to provided in hdfs-storage
* address comments
* address comments
* Future-proof some Guava usage
* Use a java-util EmptyIterator instead of Guava's
* Change some of the guava future handling to do manual async
transforms. Guava changes transform into transformAsync by deprecating
transform in ONLY Guava 19. Then its gone in 20
* Use `Collections.emptyIterator()`
* Pretty formatting
* Make listenable future transforms a thing in default druid
* Format fix
* Add forbidden guava apis
* Make the ListenableFutrues.transformAsync have comments
* Undo intellij bad pattern matching in comments
* Futrues --> Futures
* Add empty iterators forbidding
* Fix extra `A`
* Correct method signature
* Address review comments
* Finish Gian review comments
* Proper syntax from https://github.com/policeman-tools/forbidden-apis/wiki/SignaturesSyntax
* Fix round robining in router.
Say that ten times fast.
For query endpoints, AsyncQueryForwardingServlet called hostFinder.getDefaultServer()
to set a default server, followed by hostFinder.getServer(inputQuery) to override it
with query-specific routing. Since hostFinder is round-robin, this skips a server.
When there are only two servers, one server is _always_ skipped and the router sends
all queries to the same broker.
* Adjust spacing.
* adding a properties endpoint in status resource
* checkstyle fixes
* more checkstyle corrections
* correcting the resource filter for properties endpoint
* adding feature of hiding sensitive properties
* checkstyle changes
* review changes for adding default hidden properties and using jackson for arrays value
* making review changes
* Fix early publishing to early pushing in batch indexing & refactor appenderatorDriver
* fix compile
* rename and add more javadocs
* Fix conflicts
* address comments
* revert await executors
* fix test
* Properly set "identity" in query metrics.
This patch adds an "identity" field to QueryPlus and sets it in
QueryLifecycle when the query starts executing. This is important
because it allows it to be used for future QueryMetrics created
by that QueryPlus object.
We also add "identity" to the request-level QueryMetrics object
created in emitLogsAndMetrics.
* Remove unused method.
* Change RealtimeIndexTask to use AppenderatorDriver instead of RealtimePlumber. Related to #4774
* Remove unused throwableDuringPublishing
* Fix usage of forbidden API
* Update realtime index IT to account for not skipping older data any more
* Separate out waiting on publish futures and handoff futures to avoid a race condition where the handoff timeout expires before the segment is published
* #5261 Add separate AppenderatorDriverRealtimeIndexTask and revert changes to RealtimeIndexTask
* #5261 Add separate AppenderatorDriverRealtimeIndexTask and revert changes to RealtimeIndexTask
* #5261 Readability improvements in AppenderatorDriverRealtimeIndexTask. Combine publish and handoff futures in to single future
* #5261 Add separate tuningConfig for RealtimeAppenderatorIndexTask. Revert changes to RealtimeTuningConfig
* #5261 Change JSON type to realtime_appenderator to keep the same naming pattern as RealtimeIndexTask
* Fix races in LookupSnapshotTaker, CoordinatorPollingBasicAuthenticatorCacheManager.
Both were susceptible to the following conditions:
1. Two JVMs on the same machine (perhaps two peons) could conflict by one reading while the
other was writing, or by writing to the file at the same time.
2. One JVM could partially write a file, then crash, leaving a truncated file.
* Use StringUtils.format
* Use both Joad Ids and Java IDs as Timezone to string readers
Change-Id: Ieb5c18559879f3f3a0104912ce2f0a354ad0aac3
* move the function to DateTimes and add org.joda.time.DateTimeZone#forID as part of forbidden api
Change-Id: Iff97fa044758019ed0c231587d10e31a9cc18da0
* exclude class and remove other usage
Change-Id: Ib458c2caaa1865535767e1009fbf017a92c8f615
* remove it from test classes
Change-Id: I9b576324f6c7e17a74bd8b13879232c9a8cd40b4
* remove unused
Change-Id: If1c5b70c26c2b7c83c20434cb72b2060653f5052
The behavior is configurable through druid.extensions.useExtensionClassloaderFirst.
It is useful when extensions want to load a dependency different from one provided
by Druid, for example a different version of geoip or protobuf.
Also remove Guice annotations from LoadQueueTaskMaster, since it is
provided by CliCoordinator, so Guice does not need to know how to
build one directly.
Code changes:
- In the lookup-based extractionFns, inherit injective property from
the lookup itself if not specified.
Doc changes:
- Add a "Query execution" section to the lookups doc explaining how
injective lookups and their optimizations work.
- Remove scary warnings against using registeredLookup extractionFns.
They are necessary and important since they work with filters and
function cascades -- two things that the dimension specs do not do.
They deserve to be first class citizens.
- Move the "registeredLookup" fn above the "lookup" fn. It's probably
more commonly used, so the docs read better this way.
* just renaming of SegmentChangeRequestHistory etc
* additional change history refactoring changes
* WorkerTaskManager a replica of WorkerTaskMonitor
* HttpServerInventoryView refactoring to extract sync code and robustification
* Introducing HttpRemoteTaskRunner
* Additional Worker side updates
* Fix broken KafkaEmitterConfig parsing
This was a regression introduced in
https://github.com/druid-io/druid/pull/4722
KafkaEmitterConfig property names have dot(.) in the name of properties
and JsonConfigurator behavior was changed to not support that.
Added a test and fixed parsing of properties that have dot(.) in
property names
* Fix test failure
* Add freeSpacePercent config in segment location config to enforce free space while storing segments
* address review comments
* address review comments: more doc on freeSpacePercent and use Double for freeSpacePercent
* FilteredHttpServerInventoryViewProvider to start with always false predicate for each segment discovered
* update HttpServerInventoryViewTest to ensure that predicates are honored
* add docs for HttpServerInventoryView.defaultFilter
* change to javadoc style comment
* Deduplicate DataSegments contents (loadSpec's keys, dimensions and metrics lists as a whole) more aggressively; use ArrayMap instead of default LinkedHashMap for DataSegment.loadSpec, because they have only 3 entries on average; prune DataSegment.loadSpec on brokers
* Fix DataSegmentTest
* Refinements
* Try to fix
* Fix the second DataSegmentTest
* Nullability
* Fix tests
* Fix tests, unify to use TestHelper.getJsonMapper()
* Revert TestUtil as ServerTestHelper, fix tests
* Add newline
* Fix indexing tests
* Fix s3 tests
* Try to fix tests, remove lazy caching of ObjectMapper in TestHelper, rename TestHelper.getJsonMapper() to makeJsonMapper()
* Fix HDFS tests
* Fix HdfsDataSegmentPusherTest
* Capitalize constant names
* Kafka Index Task that supports Incremental handoffs
- Incrementally handoff segments when they hit maxRowsPerSegment limit
- Decouple segment partitioning from Kafka partitioning, all records from consumed partitions go to a single druid segment
- Support for restoring task on middle manager restarts by check pointing end offsets for segments
* take care of review comments
* make getCurrentOffsets call async, keep track of publishing sequence, review comments
* fix setEndoffset duplicate request handling, formatting
* fix unit test
* backward compatibility
* make AppenderatorDriverMetadata backwards compatible
* add unit test
* fix deadlock between persist and push executors in AppenderatorImpl
* fix formatting
* use persist dir instead of work dir
* review comments
* fix deadlock
* actually fix deadlock
* maxQueryTimeout property in runtime properties.
* extra line
* move withTimeoutAndMaxScatterGatherBytes method to QueryLifeCycle.
* Fix initialize method.
* remove unused import.
* doc update.
* some more details in doc about query failure..
* minor fix.
* decorating QueryRunner to set and verify context. Added by servers.
* remove whitespace.
* SQL: Improved behavior when implicitly casting strings to date/time literals.
- Handle all flavors of ISO8601 and SQL literals.
- Throw errors on other literals instead of silently transforming them to 0.
* Respect timeZone when format is null.
* use ImmutableDruidDataSource for map and set
* address comments
* unused import
* allow returning only ImmutableDruidDataSource in MetadataSegmentManager
* address comments
* remove TreeSet
* revert to use TreeSet
* use default brokerServiceName when priority is not valid
* use AtomicInteger for NodesHolder.roundRobinIndex
* revert inspectionProfiles change
* adjust TieredBrokerHostSelectorTest
* combine if statements and ensure index does not become negative
* set next index with mod if overflows
* fix codestyle
* use nextIndex
* extract the while loop to a method
* Add retries for coordinator fetch and lookup start in LookupReferencesManager
* Fix LookupConfigTest
* Address comments
* Address more comments
* And address more comments
* Address comms
* Recognize 'not found' lookups in LookupReferencesManager.tryGetLookupListFromCoordinator(), by @egor-ryashin
* Fix havingSpec on complex aggregators.
- Uses the technique from #4883 on DimFilterHavingSpec too.
- Also uses Transformers from #4890, necessitating a move of that and other
related classes from druid-server to druid-processing. They probably make
more sense there anyway.
- Adds a SQL query test.
Fixes#4957.
* Remove unused import.
* Introduce "transformSpec" at ingest-time.
It accepts a "filter" (standard query filter object) and "transforms" (a
list of objects with "name" and "expression"). These can be used to do
filtering and single-row transforms without need for a separate data
processing job.
The "expression" fields use the same expression language as other
expression-based feature.
* Remove forbidden api.
* Fix compile error.
* Fix tests.
* Some more changes.
- Add nullable annotation to Firehose.nextRow.
- Add tests for index task, realtime task, kafka task, hadoop mapper,
and ingestSegment firehose.
* Fix bad merge.
* Adjust imports.
* Adjust whitespace.
* Make Transform into an interface.
* Add missing annotation.
* Switch logger.
* Switch logger.
* Adjust test.
* Adjustment to handling for DatasourceIngestionSpec.
* Fix test.
* CR comments.
* Remove unused method.
* Add javadocs.
* More javadocs, and always decorate.
* Fix bug in TransformingStringInputRowParser.
* Fix bad merge.
* Fix ISFF tests.
* Fix DORC test.
* Fix binary serialization in caching
The previous caching code just concatenated a list of objects into a
byte array -- this is actually not valid because jackson-databind uses
enumerated references to strings internally, and concatenating multiple
binary serialized objects can throw off the references.
This change uses a single JsonGenerator to serialize the object list
rather than concatenating byte arrays.
* remove unused imports
* introducing CuratorLoadQueuePeon
* HttpLoadQueuePeon based off of current code
* Revert "Remove SegmentLoaderConfig.numLoadingThreads config (#4829)"
This reverts commit d8b3bfa63c.
* SegmentLoadDropHandler copy/pasted from ZkCoordinator
* Revert "1-based counts in ZkCoordinator (#4917)"
This reverts commit e725ff4146.
* remove non-zk part from ZkCoordinator
* remove zk part from SegmentLoadDropHandler
* additional changes for segment load/drop management with http
* address review comments
* add some more logs
* Execs class is moved
* Only consider loaded replicants when computing replication status.
This affects the computation of segment/underReplicated/count and
segment/unavailable/count, as well as the loadstatus?simple and
loadstatus?full APIs.
I'm not sure why they currently consider segments in the load
queues, but it would make more sense to me if they only considered
segments that are actually loaded.
* Fix tests.
* Fix imports.
* Changes for lookup synchronization
* Refactor of Lookup classes
* Minor refactors and doc update
* Change coordinator instance to be retrieved by DruidLeaderClient
* Wait before thread shutdown
* Make disablelookups flag true by default
* Update docs
* Rename flag
* Move executorservice shutdown to finally block
* Update LookupConfig
* Refactoring and doc changes
* Remove lookup config constructor
* Revert Lookupconfig constructor changes
* Add tests to LookupConfig
* Make executorservice local
* Update LRM
* Move ListeningScheduledExecutorService to ExecutorCompletionService
* Move exception to outer block
* Remove check to see future is done
* Remove unnecessary assignment
* Add logging
* Add ability to optionally specify a sequence identifier to reduce the possibility of duplicate events entering the event receiver firehose
* Add ability to optionally specify a sequence identifier to reduce the possibility of duplicate events entering the event receiver firehose
* Add a hard coded limit to the maximum number of possible producer IDs to prevent a malicious (or uninformed) client from overflowing the heap
* add configs to enable fast request failure on broker
* address review comments
* fix styling error
* fix style error
* have enableRequestLimit config instead of having user specify max limit
* add comment
* fix style error
* add UT fo LimitRequestsFilter
* address review comments
* fix test
* make LimitRequestsFilterTest more robust
* fix JettyQosTest
* Fix#4647
* NPE protect bucketInterval as well
* Add test to verify timezone as well
* Also handle case when intervals are already present
* Fix checkstyle error
* Use factory method instead for Datetime
* Use Intervals factory method
* Priority on loading for primary replica
* Simplicity fixes
* Fix on skipping drop for quick return.
* change to debug logging for no replicants.
* Fix on filter logic
* swapping if-else
* Fix on wrong "hasTier" logic
* Refactoring of LoadRule
* Rename createPredicate to createLoadQueueSizeLimitingPredicate
* Rename getHolderList to getFilteredHolders
* remove varargs
* extract out currentReplicantsInTier
* rename holders to holdersInTier
* don't do temporary removal of tier.
* rename primaryTier to tierToSkip
* change LinkedList to ArrayList
* Change MinMaxPriorityQueue in DruidCluster to TreeSet.
* Adding some comments.
* Modify log messages in light of predicates.
* Add in-method comments
* Don't create new Object2IntOpenHashMap for each run() call.
* Cache result from strategy call in the primary assignment to be reused during the same run.
* Spelling mistake
* Cleaning up javadoc.
* refactor out loading in progress check.
* Removed redundant comment.
* Removed forbidden API
* Correct non-forbidden API.
* Precision in variable type for NavigableSet.
* Obsolete comment.
* Clarity in method call and moving retrieval of ServerHolder into method call.
* Comment on mutability of CoordinatoorStats.
* Added auxiliary fixture for dropping.
* Add identity to query metrics, logs.
Also fix a bug where unauthorized requests would not emit any logs or metrics,
and instead would log a "Tried to emit logs and metrics twice" warning.
Also rename QueryResource's "getServer" to "cancelQuery", because that's what
it does.
* Do not emit identity by default.
* remove ServerConfig from DruidNode as all information needs to be present in DruidNode serialized form
* sanitize output of /druid/coordinator/v1/cluster endpoint
* Added org.joda.time.DateTime#(java.lang.String) to forbidden API.
* Added org.joda.time.DateTime#(java.lang.String, org.joda.time.format.DateTimeFormatter) to forbidden API.
* Add additional APIs that may create DateTime with default time zone
* Add helper function that accepts formatter to parse String.
* Add additional forbidden APIs
* Replace existing usage of forbidden APIs
* Use wrapper class to enforce Chronology on DateTimeFormatter.
* Creates constant UtcFormatter for constant ISODateTimeFormat.
* Move caffeine out of extension.
* Remove `JsonTypeName` from the class itself
* Fix bad docs
* Fix distribution pom
* Fix unused import
* Make caffeine default
* Address code comments
* Add more description around the jre version in the readme
* Add suggested comments
* Move emitters from io.druid.server.initialization to the dedicated io.druid.server.emitter package; Update emitter library to 0.6.0; Add support for ParametrizedUriEmitter; Support hierarical properties in JsonConfigurator (was needed for ParametrizedUriEmitter)
* Log created RequestLoggers
* Fix forbidden API
* Test fix
* More Http and Parametrized Http Emitter docs
* Switch to debug level
* fixes HttpServerInventoryView to call server/segment callbacks correctly and Unit Tests for the class
* fix checkstyle and forbidden-api errors
* HttpServerInventoryView to finish start() only after server inventory is initialized
* fix compilation errors
* address review comments
* add exponential backoff instead of fixed 5 secs on successive failures
* update test to exercise server fail scenarios
* use AtomicInteger for requestNum and increment only once
* Move scan-query from a contrib extension into core.
Based on a proposal at: https://groups.google.com/d/topic/druid-development/ME_OatUDnbk/discussion
This patch also adds support for virtual columns to the Scan query,
and updates Druid SQL to use Scan instead of Select.
This patch also makes some behavioral changes to handling of the __time
column. In particular, it is now is returned as "__time" rather than
"timestamp"; it is no longer included if you do not specifically ask for
it in your "columns"; and it is returned as a long rather than a string.
Users can revert time handling to the legacy extension behavior by
setting "legacy" : true in their queries, or setting the property
druid.query.scan.legacy = true. This is meant to provide a migration
path for users that were formerly using the contrib extension.
* Adjustments from review.
* Add back Select query.
* Adjust SQL docs.
* Restore SelectQuery link.
* SQL: Full TRIM support.
- Support trimming arbitrary characters
- Support BOTH, LEADING, and TRAILING
* Remove unused import.
* Fix tests, add RTRIM / LTRIM.
* Remove unused imports.
* BTRIM and docs.
* Replace for with foreach.
* Use internal-discovery and http for talking to overlord/coordinator leaders
* CuratorDruidNodeDiscovery.getAllNodes() best effort 30 sec wait for cache initialization
* DruidLeaderClientProvider to eagerly instantiate DruidNodeDiscovery when needed so that DruidNodeDiscovery impl cache gets initialized well in time
* Revert "DruidLeaderClientProvider to eagerly instantiate DruidNodeDiscovery when needed so that DruidNodeDiscovery impl cache gets initialized well in time"
This reverts commit f1a2432614ba56ddc2d55fe47e990d17fcfd6129.
* add lifecycle to DruidLeaderClient to early initialize DruidNodeDiscovery so that it has its cache update well in time
Expose version property for so that it may serialize/deserialize correctly
Expose version property for `CustomVersioningPolicy` so that it may serialize/deserialize correctly
Expose version property for CustomVersioningPolicy so that it may serialize/deserialize correctly
Expose version property for `CustomVersioningPolicy` so that it may serialize/deserialize correctly
* DruidLeaderSelector interface for leader election and Curator based impl. DruidCoordinator/TaskMaster are updated to use the new interface.
* add fake DruidNode binding in integration-tests module
* add docs on DruidLeaderSelector interface
* remove start/stop and keep register/unregister Listener in DruidLeaderSelector interface
* updated comments on DruidLeaderSelector
* cache the listener executor in CuratorDruidLeaderSelector
* use same latch owner name that was used before
* remove stuff related to druid.zk.paths.indexer.leaderLatchPath config
* randomize the delay when giving up leadership and restarting leader latch
* Factor QueryableIndexColumnSelectorFactory and IncrementalIndexColumnSelectorFactory out of QueryableIndexStorageAdapter and IncrementalIndexStorageAdapter; Add Offset.getBaseReadableOffset(); Remove OffsetHolder interface; Replace Cursor extends ColumnSelectorFactory with composition; Reduce indirection in ColumnValueSelectors created by QueryableIndexColumnSelectorFactory
* Don't override clone() in FilteredOffset (the prev. implementation was broken); Some warnings fixed
* Simplify Cursors in QueryableIndexStorageAdapter
* Address comments
* Remove unused and unimplemented methods from GenericColumn interface
* Comments
* Add @ExtensionPoint and @PublicApi annotations.
* Clean up wording.
* Remove unused import.
* Remove unused imports.
* Only types can be extension points.
* Adjust annotations some more.
* Remove unused import.
* Make ServletFilterHolder an extension point.
* Add a couple extension points, and update docs.
* Do not remove segment that should not be moved from currentlyMovingSegments (segments are removed by callbacks or not inserted)
* Mark segments that are going to be dropped from server and use this information in CostBalancerStrategy
* Fix tests
* update LookupCoordinatorManager to use internal discovery to discover lookup nodes
* router:use internal-discovery to discover brokers
* minor [Curator]DruidDiscoveryProvider refactoring
* add initialized() method to DruidNodeDiscovery.Listener
* update HttpServerInventoryView to use initialized() and call segment callback initialization asynchronously
* Revert "update HttpServerInventoryView to use initialized() and call segment callback initialization asynchronously"
This reverts commit f796e441221fe8b0e9df87fdec6c9f47bcedd890.
* Revert "add initialized() method to DruidNodeDiscovery.Listener"
This reverts commit f0661541d073683f28fce2dd4f30ec37db90deb0.
* minor refactoring removing synchronized from DruidNodeDiscoveryProvider
* updated DruidNodeDiscovery.Listener contract to take List of nodes and first call marks initialization
* update HttpServerInventoryView to handle new contract and handle initialization
* router update to handle updated listener contract
* document DruidNodeDiscovery.Listener contract
* fix forbidden-api error
* change log level to info for unknown path children cache events in CuratorDruidNodeDiscoveryProvider
* announce broker only after segment inventory is initialized
* internal-discovery: interfaces for announcement/discovery, curator impls
* more tests
* address some review comments
* more fixes
* address more review comments
* simplify ObjectMapper setup in CuratorDruidNodeAnnouncerAndDiscoveryTest
* fix KafkaIndexTaskTest
* make lookupTier overridable via RealtimeIndexTask and KafkaIndexTask context
* make teamcity build happy
* Do not remove segment that should not be moved from currentlyMovingSegments (segments are removed by callbacks or not inserted)
* Replace putIfAbsent with computeIfAbsent in DruidBalancer
* Refactoring
* make BrokerQueryResource instantiation singleton
* fix druid.router.http.* handling so that they are actually used and introduce numRequestsQueued for jetty http client at router
* address comments
* address review comment
* Use ConcurrentHashMap to store segment servers or else getInventory() would need to clone the values list
* introduce unstableTimeout for segment servers
* address review comment
* add HttpServerInventoryViewConfigTest
* Add metrics to the native queries underpinning SQL.
This is done by factoring out the metrics and request log emitting
code from QueryResource into a new QueryLifecycle class. That class
is used by both QueryResource and the SQL DruidSchema and QueryMaker.
Also fixes a couple of bugs in QueryResource:
- RequestLogLine start time was set to `TimeUnit.NANOSECONDS.toMillis(startNs)`,
which is incorrect since absolute nanos cannot be converted to millis.
- DruidMetrics.makeRequestMetrics was called with null `query` on
unparseable queries, which led to spurious "Unable to log query"
errors.
Partial fix for #4047.
* Code style
* Remove unused imports.
* Fix tests.
* Remove unused import.
* Fixes and improvements to SQL metadata caching.
Also adds support for MultipleSpecificSegmentSpec to CachingClusteredClient.
SQL changes:
- Cache metadata on a per-segment level, in addition to per-dataSource, so
we don't need to re-query all segments whenever a single new one appears.
This should lower the load placed on the cluster by metadata queries.
- Fix race condition in DruidSchema that can cause us to miss metadata. It was
possible to notice new segments, then issue a query, and have that query
not actually hit those segments, and not notice that it didn't hit those segments.
Then, the metadata from those segments would be ignored.
- Fix assumption in DruidSchema that all segments are immutable. Now, mutable
segments are periodically re-queried.
- Fix inappropriate re-use of SchemaPlus. Now we create one for each planning
cycle, rather than sharing one. It caches table objects, which we want to
avoid, since it can cause stale metadata. We do the caching in DruidSchema
so we don't need the SchemaPlus caching.
Server changes:
- Add a TimelineCallback to TimelineServerView, for callers that want to get updates
when the timeline has been modified.
- Change CachingClusteredClient from a QueryRunner to a QuerySegmentWalker. This
allows it to accept queries that are segment-descriptor-based rather than
intervals-based. In particular it will now support MultipleSpecificSegmentSpec.
* Fix DruidSchema, and unused imports.
* Remove unused import.
* Fix SqlBenchmark.
* Rename ResourceManagementStrategy to ProvisioningStrategy, similarly for related classes. Make ProvisioningService non-global, created per RemoteTaskRunner instead. Add OverlordBlinkLeadershipTest.
* Fix RemoteTaskRunnerFactoryTest.testExecNotSharedBetweenRunners()
* Small fix
* Make SimpleProvisioner and PendingProvisioner more similar in details
* Fix executor name
* Style fixes
* Use LifecycleLock in RemoteTaskRunner
* Remove some unnecessary use of boxed types.
* Fix some incorrect format strings.
* Enable IDEA's MalformedFormatString inspection.
* Add a Checkstyle check for finding uses of incorrect logging packages.
* Fix some incorrect usages of the metamx logger.
* Bypass incorrect logger Checkstyle check where using the correct logger is not simple.
* Fix some more places where the wrong number of arguments are provided to format strings.
* Suppress `MalformedFormatString` inspection on legacy logging test.
* Use @SuppressWarnings rather than a noinspection suppression comment.
* Fix some more incorrect format strings.
* Suppress some more incorrect format string warnings where the incorrect string is intentional.
* Log the aggregator when closing it fails.
* Remove some unneeded log lines.
* Early publishing segments in the middle of data ingestion
* Remove unnecessary logs
* Address comments
* Refactoring the patch according to #4292 and address comments
* Set the total shard number of NumberedShardSpec to 0
* refactoring
* Address comments
* Fix tests
* Address comments
* Fix sync problem of committer and retry push only
* Fix doc
* Fix build failure
* Address comments
* Fix compilation failure
* Fix transient test failure
* Avoid usages of Default system Locale and printing to System.out or System.err in production code
* Fix Charset in DruidKerberosUtil
* Remove redundant string format in GenericIndexed
* Rename StringUtils.safeFormat() to unimportantSafeFormat(); add StringUtils.format() which fails as well as String.format()
* Fix testSafeFormat()
* More fixes of redundant StringUtils.format() inside ISE
* Rename unimportantSafeFormat() to nonStrictFormat()