* Use partitionsSpec for all task types
* fix doc
* fix typos and revert to use isPushRequired
* address comments
* move partitionsSpec to core
* remove hadoopPartitionsSpec
* firehose doc adjustments
* fix typo
* additional information on parser types in ingestion docs
* clarify ingest segment firehose docs, add sql firehose examples to sql extension pages
* fixit
* make sql firehose more forgiving my always constructing a MapInputRowParser from the parseSpec of whatever actual InputRowParser impl is provided, remove doc references to map based parsers
* transforms
* fix tests
* Fix dependency analyze warnings
Update the maven dependency plugin to the latest version and fix all
warnings for unused declared and used undeclared dependencies in the
compile scope. Added new travis job to add the check to CI. Also fixed
some source code files to use the correct packages for their imports.
* Fix licenses and dependencies
* Fix licenses and dependencies again
* Fix integration test dependency
* Address review comments
* Fix unit test dependencies
* Fix integration test dependency
* Fix integration test dependency again
* Fix integration test dependency third time
* Fix integration test dependency fourth time
* Fix compile error
* Fix assert package
* remove unecessary lock in ForegroundCachePopulator leading to a lot of contention
* mutableboolean, javadocs,document some cache configs that were missing
* more doc stuff
* adjustments
* remove background documentation
* add CachingClusteredClient benchmark, refactor some stuff
* revert WeightedServerSelectorStrategy to ConnectionCountServerSelectorStrategy and remove getWeight since felt artificial, default mergeResults in toolchest implementation for topn, search, select
* adjust javadoc
* adjustments
* oops
* use it
* use BinaryOperator, remove CombiningFunction, use Comparator instead of Ordering, other review adjustments
* rename createComparator to createResultComparator, fix typo, firstNonNull nullable parameters
* #7858 Throwing UnsupportedOperationException from ImmutableDruidDataSource's equals() and hashCode() methods.
* 1. Turning ImmutableDruidDataSource into a data container. 2. Adding a Util method to be used in tests for checking equality of ImmutableDruidDataSource objects.
* Removing unused method
* Fixing assert equals
* Fixing assert equals in TestUtils.java
* Adding java doc comments, Using ExpectedException in tests
* Fixing test cases
* Fixed expected exception message in tests
* fixed line width
* line width fix
* code style fixes
* code indentation fixes
* fixing method name
* doc updates and changes to use the CollectionUtils.mapValues utility method
* Add Structural Search patterns to intelliJ
* refactoring from PR comments
* put -> putIfAbsent
* do single key lookup
* Add inline firehose
To allow users to quickly parsing and schema, add a firehose that reads
data that is inlined in its spec.
* Address review comments
* Remove suppression of sonar warnings
* disable all compression in intermediate segment persists while ingestion
* more changes and build fix
* by default retain existing indexingSpec for intermediate persisted segments
* document indexSpecForIntermediatePersists index tuning config
* fix build issues
* update serde tests
The endpoints added in #6272 were missing authorization checks. This patch removes the bulk
methods from SupervisorManager, and instead has SupervisorResource run the full list through
filterAuthorizedSupervisorIds before calling resume/suspend/terminate one by one.
Make static imports forbidden in tests and remove all occurrences to be
consistent with the non-test code.
Also, various changes to files affected by above:
- Reformat to adhere to druid style guide
- Fix various IntelliJ warnings
- Fix various SonarLint warnings (e.g., the expected/actual args to
Assert.assertEquals() were flipped)
* https://github.com/apache/incubator-druid/issues/7316 Use Map.putIfAbsent() instead of containsKey() + put()
* fixing indentation
* Using map.computeIfAbsent() instead of map.putIfAbsent() where appropriate
* fixing checkstyle
* Changing the recommendation text
* Reverting auto changes made by IDE
* Implementing recommendation: A ConcurrentHashMap on which computeIfAbsent() is called should be assigned into variables of ConcurrentHashMap type, not ConcurrentMap
* Removing unused import
* Add state and error tracking for seekable stream supervisors
* Fixed nits in docs
* Made inner class static and updated spec test with jackson inject
* Review changes
* Remove redundant config param in supervisor
* Style
* Applied some of Jon's recommendations
* Add transience field
* write test
* implement code review changes except for reconsidering logic of markRunFinishedAndEvaluateHealth()
* remove transience reporting and fix SeekableStreamSupervisorStateManager impl
* move call to stateManager.markRunFinished() from RunNotice to runInternal() for tests
* remove stateHistory because it wasn't adding much value, some fixes, and add more tests
* fix tests
* code review changes and add HTTP health check status
* fix test failure
* refactor to split into a generic SupervisorStateManager and a specific SeekableStreamSupervisorStateManager
* fixup after merge
* code review changes - add additional docs
* cleanup KafkaIndexTaskTest
* add additional documentation for Kinesis indexing
* remove unused throws class
This includes the router, overlord, middleManager, and coordinator.
Does the following things:
- Loads LookupSerdeModule on MM, overlord, and coordinator.
- Adds LookupExprMacro to LookupSerdeModule, which allows these node
types to understand that the 'lookup' function exists.
- Adds a test to make sure that LookupSerdeModule works for virtual
columns, filters, transforms, and dimension specs.
This is implementing the technique discussed on these two issues:
- https://github.com/apache/incubator-druid/issues/7724#issuecomment-494723333
- https://github.com/apache/incubator-druid/pull/7082#discussion_r264888771
* Add checkstyle for "Local variable names shouldn't start with capital"
* Adjust some local variables to constants
* Replace StringUtils.LINE_SEPARATOR with System.lineSeparator()
* fix lookup editor to use lookup tiers instead of historical tiers
* use default tier if empty response, fix if configured lookups is null
* fixes
* fix typo
* Bump Checkstyle to 8.20
Moderate severity vulnerability that affects:
com.puppycrawl.tools:checkstyle
Checkstyle prior to 8.18 loads external DTDs by default,
which can potentially lead to denial of service attacks
or the leaking of confidential information.
Affected versions: < 8.18
* Oops, missed one
* Oops, missed a few
* Set direct memory if unable to detect JVM config
Java 9 and above prevents us from detecting the maximum available direct
memory.
This change adds a fallback method to use at most 25% of maximum heap
size, which should be a reasonable default.
Unless -XX:MaxDirectMemorySize is set, recent JVMs will default maximum
direct memory to match the maximum heap size, so this should work out of
the box in most cases. For completeness we print instructions in the log
to explain how to adjust settings if necessary.
* skip test rather than succeeding
* reword log message
Co-Authored-By: Himanshu <g.himanshu@gmail.com>
* update easymock / powermock for to 4.0.2 / 2.0.2 for JDK11 support
* update tests to use new easymock interfaces
* fix tests failing due to easymock fixes
* remove dependency on jmockit
* fix race condition in ResourcePoolTest
* V1 - improve parallelism of zookeeper based segment change processing
* Create zk nodes in batches. Address code review comments.
Introduce various configs.
* Add documentation for the newly added configs
* Fix test failures
* Fix more test failures
* Remove prinstacktrace statements
* Address code review comments
* Use a single queue
* Address code review comments
Since we have a separate load peon for every historical, just having a single SegmentChangeProcessor
task per historical is enough. This commit also gets rid of the associated config druid.coordinator.loadqueuepeon.curator.numCreateThreads
* Resolve merge conflict
* Fix compilation failure
* Remove batching since we already have a dynamic config maxSegmentsInNodeLoadingQueue that provides that control
* Fix NPE in test
* Remove documentation for configs that are no longer needed
* Address code review comments
* Address more code review comments
* Fix checkstyle issue
* Address code review comments
* Code review comments
* Add back monitor node remove executor
* Cleanup code to isolate null checks and minor refactoring
* Change param name since it conflicts with member variable name
* sampler initial check-in
fix checkstyle issues
add sampler fix to process CSV files from cache properly
change to composition and rename some classes
add tests and report num rows read and indexed
remove excludedByFilter flag and don't send filtered out data
fix tests to handle both settings for druid.generic.useDefaultValueForNull
* wrap sampler firehose in TimedShutoffFirehoseFactory to support timeouts
* code review changes - add additional comments, limit maxRows
* EmitterModule: Throw an error on invalid emitter types.
The current behavior of silently using the "noop" emitter is unhelpful
and makes it difficult to debug config typos.
* Add comments.
* BaseAppenderatorDriver: Fix potentially overeager segment cleanup.
Here is a thing that I think can go wrong:
1. We push some segments, then try to publish them transactionally.
2. The segments are actually published, but the 200 OK response gets
lost (connection dropped, whatever).
3. We try again, and on the second try, the publish fails (because
the transaction baseline start metadata no longer matches).
4. Because the publish failed, we delete the pushed segments.
5. But this is bad, because the publish didn't really fail, it actually
succeeded in step 2.
I haven't seen this in the wild, but thought about it while
reviewing #7537.
This patch also cleans up logging a bit, making it more accurate and
somewhat less chatty.
* Avoid wrapping exceptions when not necessary.
* Initial commit
* Added test for int to long conversion
* Add appenderator test for realtime scan query
* get rid of todo
* Fix forbidden apis
* Jon's recommendations
* Formatting
* Add reload by interval API
Implements the reload proposal of #7439
Added tests and updated docs
* PR updates
* Only build timeline with required segments
Use 404 with message when a segmentId is not found
Fix typo in doc
Return number of segments modified.
* Fix checkstyle errors
* Replace String.format with StringUtils.format
* Remove return value
* Expand timeline to segments that overlap for intervals
Restrict update call to only segments that need updating.
* Only add overlapping enabled segments to the timeline
* Some renames for clarity
Added comments
* Don't rely on cached poll data
Only fetch required information from DB
* Match error style
* Merge and cleanup doc
* Fix String.format call
* Add unit tests
* Fix unit tests that check for overshadowing
* Add api to drop data by interval
* update to address comments
* unused imports
* PR comments + add tests in SQLMetadataSegmentManagerTest
* update tests and docs
* Add SegmentDescriptor interval in the hash while calculating Etag
* Add computeResultLevelCacheKey to CacheStrategy
Make HavingSpec cacheable and implement getCacheKey for subclasses
Add unit tests for computeResultLevelCacheKey
* Add more tests
* Use CacheKeyBuilder for HavingSpec's getCacheKey
* Initialize aggregators map to avoid NPE
* adjust cachekey builder for HavingSpec to ignore aggregators
* unused import
* PR comments
Java 9 removed support for sun.misc.Cleaner in favor of
java.lang.ref.Cleaner. This change adds a thin abstraction to switch
between Cleaner implementations based on JDK version at runtime
* Enhnace the HttpFirehose to work with both insecure URIs and URIs requiring basic authentication
* Improve security of enhanced HttpFirehoseFactory by not logging auth credentials
* Fix checkstyle failure in HttpFirehoseFactory.java
* Update docs and fix TeamCity build with required noinspection
* Indentation cleanup and logic modification for HttpFirehose object stream
* Remove default Empty string password provider in http firehose
* Add JavaDoc for MixIn describing its intended use
* Reverting documentation notation for json code to be inline with rest of doc
* Improve instantiation of ObjectMappers that require MixIn for redacting password from task logs
* Add comment to clarify fully qualified references of Objects in SQLMetadataStorageActionHandler
Removes the coordinator sanity check that prevents it from dropping all
segments. It's useful to get rid of this, since the behavior is
unintuitive for dev/testing clusters where users might regularly want
to drop all their data to get back to a clean slate.
But the sanity check was there for a reason: to prevent a race condition
where the coordinator might drop all segments if it ran before the
first metadata store poll finished. This patch addresses that concern
differently, by allowing methods in MetadataSegmentManager to return
null if a poll has not happened yet, and canceling coordinator runs
in that case.
This patch also makes the "dataSources" reference in
SQLMetadataSegmentManager volatile. I'm not sure why it wasn't volatile
before, but it seems necessary to me: it's not final, and it's dereferenced
from multiple threads without synchronization.
* refactor lookups to be more chill to router
* remove accidental change
* fix and combine LookupIntrospectionResourceTest
* fix inspection
* rename RouterLookupModule to LookupSerdeModule and RouterLookupExtractorFactoryContainerProvider to NoopLookupExtractorFactoryContainerProvider
* make comment generic
* use ConfigResourceFilter instead of StateResourceFilter
* fix indentation
* unused import
* another unused import
* refactor some stuff into processing module, split up LookupModule.java classes into their own files
* Fix two issues with Coordinator -> Overlord communication.
1) ClientCompactQuery needs to recognize the potential for 'intervals'
to be set instead of 'segments'. The lack of this led to a
NullPointerException on DruidCoordinatorSegmentCompactor.java:102.
2) In two locations (DruidCoordinatorSegmentCompactor,
DruidCoordinatorCleanupPendingSegments) tasks were being retrieved
using waiting/pending/running tasks in the wrong order: by checking
'running' first and then 'pending', tasks could be missed if they
moved from 'pending' to 'running' in between the two calls. Replaced
these methods with calls to 'getActiveTasks', a new method that does
the calls in the right order.
* Remove unused import.
* Make IngestSegmentFirehoseFactory splittable for parallel ingestion
* Code review feedback
- Get rid of WindowedSegment
- Don't document 'segments' parameter or support splitting firehoses that use it
- Require 'intervals' in WindowedSegmentId (since it won't be written by hand)
* Add missing @JsonProperty
* Integration test passes
* Add unit test
* Remove two FIXME comments from CompactionTask
I'd like to leave this PR in a potentially mergeable state, but I still would
appreciate reviewer eyes on the questions I'm removing here.
* Updates from code review
* Update init
Fix bin/init to source from proper directory.
* Fix for Proposal #6518: Shutdown druid processes upon complete loss of ZK connectivity
* Zookeeper Loss:
- Add feature documentation
- Cosmetic refactors
- Variable extractions
- Remove getter
* - Change config key name and reword documentation
- Switch from Function<Void,Void> to Runnable/Lambda
- try { … } finally { … }
* Fix line length too long
* - change to formatted string for logging
- use System.err.println after lifecycle stops
* commenting on makeEnsembleProvider()-created Zookeeper termination
* Add javadoc
* added java doc reference back to apache discussion thread.
* move comment to other class
* favor two-slash comments instead of multiline comments
* maxTotalRows should be checked in DataSourceCompactionConfig before setting targetCompactionSizeBytes
* remove unnecessary default values
* remove flacky test
* fix build
* Add comments
* Moved Scan Builder to Druids class and started on Scan Benchmark setup
* Need to form queries
* It runs.
* Stuff for time-ordered scan query
* Move ScanResultValue timestamp comparator to a separate class for testing
* Licensing stuff
* Change benchmark
* Remove todos
* Added TimestampComparator tests
* Change number of benchmark iterations
* Added time ordering to the scan benchmark
* Changed benchmark params
* More param changes
* Benchmark param change
* Made Jon's changes and removed TODOs
* Broke some long lines into two lines
* nit
* Decrease segment size for less memory usage
* Wrote tests for heapsort scan result values and fixed bug where iterator
wasn't returning elements in correct order
* Wrote more tests for scan result value sort
* Committing a param change to kick teamcity
* Fixed codestyle and forbidden API errors
* .
* Improved conciseness
* nit
* Created an error message for when someone tries to time order a result
set > threshold limit
* Set to spaces over tabs
* Fixing tests WIP
* Fixed failing calcite tests
* Kicking travis with change to benchmark param
* added all query types to scan benchmark
* Fixed benchmark queries
* Renamed sort function
* Added javadoc on ScanResultValueTimestampComparator
* Unused import
* Added more javadoc
* improved doc
* Removed unused import to satisfy PMD check
* Small changes
* Changes based on Gian's comments
* Fixed failing test due to null resultFormat
* Added config and get # of segments
* Set up time ordering strategy decision tree
* Refactor and pQueue works
* Cleanup
* Ordering is correct on n-way merge -> still need to batch events into
ScanResultValues
* WIP
* Sequence stuff is so dirty :(
* Fixed bug introduced by replacing deque with list
* Wrote docs
* Multi-historical setup works
* WIP
* Change so batching only occurs on broker for time-ordered scans
Restricted batching to broker for time-ordered queries and adjusted
tests
Formatting
Cleanup
* Fixed mistakes in merge
* Fixed failing tests
* Reset config
* Wrote tests and added Javadoc
* Nit-change on javadoc
* Checkstyle fix
* Improved test and appeased TeamCity
* Sorry, checkstyle
* Applied Jon's recommended changes
* Checkstyle fix
* Optimization
* Fixed tests
* Updated error message
* Added error message for UOE
* Renaming
* Finish rename
* Smarter limiting for pQueue method
* Optimized n-way merge strategy
* Rename segment limit -> segment partitions limit
* Added a bit of docs
* More comments
* Fix checkstyle and test
* Nit comment
* Fixed failing tests -> allow usage of all types of segment spec
* Fixed failing tests -> allow usage of all types of segment spec
* Revert "Fixed failing tests -> allow usage of all types of segment spec"
This reverts commit ec470288c7.
* Revert "Merge branch '6088-Time-Ordering-On-Scans-N-Way-Merge' of github.com:justinborromeo/incubator-druid into 6088-Time-Ordering-On-Scans-N-Way-Merge"
This reverts commit 57033f36df, reversing
changes made to 8f01d8dd16.
* Check type of segment spec before using for time ordering
* Fix bug in numRowsScanned
* Fix bug messing up count of rows
* Fix docs and flipped boolean in ScanQueryLimitRowIterator
* Refactor n-way merge
* Added test for n-way merge
* Refixed regression
* Checkstyle and doc update
* Modified sequence limit to accept longs and added test for long limits
* doc fix
* Implemented Clint's recommendations
* Move GCP to a core extension
* Don't provide druid-core >.<
* Keep AWS and GCP modules separate
* Move AWSModule to its own module
* Add aws ec2 extension and more modules in more places
* Fix bad imports
* Fix test jackson module
* Include AWS and GCP core in server
* Add simple empty method comment
* Update version to 15
* One more 0.13.0-->0.15.0 change
* Fix multi-binding problem
* Grep for s3-extensions and update docs
* Update extensions.md
* Fix exclusivity for start offset in kinesis indexing service
* some adjustment
* Fix SeekableStreamDataSourceMetadata
* Add missing javadocs
* Add missing comments and unit test
* fix SeekableStreamStartSequenceNumbers.plus and add comments
* remove extra exclusivePartitions in KafkaIOConfig and fix downgrade issue
* Add javadocs
* fix compilation
* fix test
* remove unused variable
* Avoid many unnecessary materializations of collections of 'all segments in cluster' cardinality
* Fix DruidCoordinatorTest; Renamed DruidCoordinator.getReplicationStatus() to computeUnderReplicationCountsPerDataSourcePerTier()
* More Javadocs, typos, refactor DruidCoordinatorRuntimeParams.createAvailableSegmentsSet()
* Style
* typo
* Disable StaticPseudoFunctionalStyleMethod inspection because of too much false positives
* Fixes
* Throw caught exception.
* Throw caught exceptions.
* Related checkstyle rule is added to prevent further bugs.
* RuntimeException() is used instead of Throwables.propagate().
* Missing import is added.
* Throwables are propogated if possible.
* Throwables are propogated if possible.
* Throwables are propogated if possible.
* Throwables are propogated if possible.
* * Checkstyle definition is improved.
* Throwables.propagate() usages are removed.
* Checkstyle pattern is changed for only scanning "Throwables.propagate(" instead of checking lookbehind.
* Throwable is kept before firing a Runtime Exception.
* Fix unused assignments.