* Replace EmittedBatchCounter and UpdateCounter with (both not safe for concurrent increments/updates) with ConcurrentAwaitableCounter (safe for concurrent increments)
* Fixes
* Fix EmitterTest
* Added Javadoc and make awaitCount() to throw exceptions on wrong count instead of masking errors
* Fix Kerberos Authentication failing requests without cookies.
KerberosAuthenticator was failing `First` request from the clients.
After authentication we were setting the cookie properly but not
setting the the authenticated flag in the request. This PR fixed that.
Additional Fixes -
* Removing of Unused SpnegoFilterConfig - replaced by
KerberosAuthenticator
* Unused internalClientKeytab and principal from KerberosAuthenticator
* Fix docs accordingly and add docs for configuring an escalated
client.
* Fix excluded path config behavior
* spelling correction
* Revert "spelling correction"
This reverts commit fb754b43d8.
* Revert "Fix excluded path config behavior"
This reverts commit 3901047769.
* Adding feature thetaSketchConstant to do some set operation in PostAggregator
* Updated review comments for PR #5551 - Adding thetaSketchConstant
* Fixed CI build issue
* Updated review comments 2 for PR #5551 - Adding thetaSketchConstant
* Add more indexing task status and error reporting
* PR comments, add support in AppenderatorDriverRealtimeIndexTask
* Use TaskReport instead of metrics/context
* Fix tests
* Use TaskReport uploads
* Refactor fire department metrics retrieval
* Refactor input row serde in hadoop task
* Refactor hadoop task loader names
* Truncate error message in TaskStatus, add errorMsg to task report
* PR comments
* Add support for task reports, upload reports to deep storage
* PR comments
* Better name for method
* Fix report file upload
* Use TaskReportFileWriter
* Checkstyle
* More PR comments
* Use the official aws-sdk instead of jet3t
* fix compile and serde tests
* address comments and fix test
* add http version string
* remove redundant dependencies, fix potential NPE, and fix test
* resolve TODOs
* fix build
* downgrade jackson version to 2.6.7
* fix test
* resolve the last TODO
* support proxy and endpoint configurations
* fix build
* remove debugging log
* downgrade hadoop version to 2.8.3
* fix tests
* remove unused log
* fix it test
* revert KerberosAuthenticator change
* change hadoop-aws scope to provided in hdfs-storage
* address comments
* address comments
* Future-proof some Guava usage
* Use a java-util EmptyIterator instead of Guava's
* Change some of the guava future handling to do manual async
transforms. Guava changes transform into transformAsync by deprecating
transform in ONLY Guava 19. Then its gone in 20
* Use `Collections.emptyIterator()`
* Pretty formatting
* Make listenable future transforms a thing in default druid
* Format fix
* Add forbidden guava apis
* Make the ListenableFutrues.transformAsync have comments
* Undo intellij bad pattern matching in comments
* Futrues --> Futures
* Add empty iterators forbidding
* Fix extra `A`
* Correct method signature
* Address review comments
* Finish Gian review comments
* Proper syntax from https://github.com/policeman-tools/forbidden-apis/wiki/SignaturesSyntax
and added datasource information as part of existing endpoint /druid/indexer/v1/runningTasks.
Added junit test cases for the newly implemented API and fixed existing junit test cases.
Fixed review comments - added new method getCreatedDateTimeAndDataSource into TaskStorageQueryAdapter class
and formatted changed files
Druid relies on the page cache of Linux in order to have memory segments.
However when loading segments from deep storage or rebalancing the page
cache can get poisoned by segments that should not be in memory yet.
This can significantly slow down Druid in case rebalancing happens
as data that might not be queried often is suddenly in the page cache.
This PR implements the same logic as is in Apache Cassandra and Apache
Bookkeeper.
Closes#4746
Fixes test failures reported in -
https://github.com/druid-io/druid/issues/4909
Issue is that If some test skips setting up Calcite system properties
with proper encoding and loads calcite classes that use that property,
All subsequent tests in the same JVM fails.
To reproduce the issue - ExpressionsTest and CalciteQueryTest from IDE
in this order.
A better fix would be to not use System Properties in calcite, This
will work for now.
All new Calcite Unit tests that are added need to inherit
CalciteTestBase.
* Fix early publishing to early pushing in batch indexing & refactor appenderatorDriver
* fix compile
* rename and add more javadocs
* Fix conflicts
* address comments
* revert await executors
* fix test
* Fix races in LookupSnapshotTaker, CoordinatorPollingBasicAuthenticatorCacheManager.
Both were susceptible to the following conditions:
1. Two JVMs on the same machine (perhaps two peons) could conflict by one reading while the
other was writing, or by writing to the file at the same time.
2. One JVM could partially write a file, then crash, leaving a truncated file.
* Use StringUtils.format
The behavior is configurable through druid.extensions.useExtensionClassloaderFirst.
It is useful when extensions want to load a dependency different from one provided
by Druid, for example a different version of geoip or protobuf.
* use reflection to call hadoop fs.rename to workaround different hadoop jar version in main and hdfs-storage extension class loader
* find rename method recursively
* Encrypting MySQL connections
* Update docs
* Make verifyServerCertificate a configurable parameter
* Change password parameter and doc update
* Make server certificate verification disabled by default
* Update tostring
* Update docs
* Add check for trust store passwords
* Add warning for null password
* Deduplicate DataSegments contents (loadSpec's keys, dimensions and metrics lists as a whole) more aggressively; use ArrayMap instead of default LinkedHashMap for DataSegment.loadSpec, because they have only 3 entries on average; prune DataSegment.loadSpec on brokers
* Fix DataSegmentTest
* Refinements
* Try to fix
* Fix the second DataSegmentTest
* Nullability
* Fix tests
* Fix tests, unify to use TestHelper.getJsonMapper()
* Revert TestUtil as ServerTestHelper, fix tests
* Add newline
* Fix indexing tests
* Fix s3 tests
* Try to fix tests, remove lazy caching of ObjectMapper in TestHelper, rename TestHelper.getJsonMapper() to makeJsonMapper()
* Fix HDFS tests
* Fix HdfsDataSegmentPusherTest
* Capitalize constant names
* numeric quantiles sketch aggregator
* it seems that we need to synchronize all methods, which modify the state
* Seems like a false positive with -Pstrict
* code style fix
* code style fix
* use sketches-core-0.10.3
* moved cache ids to the central place
* better class names
* support large columns
* explained autodetection, added exception
* added comments regarding sketches moving on heap
* support reindexing
* implemented suggestions from jihoonson
* style fix
* use max(k, other.k) for better accuracy
* check for NilColumnValueSelector instead of null
* throw exceptions instead of providing no-op comparators
* Kerberos TGT will expire after some pre-determined time, this patch add relogin calls
Change-Id: I17ccb9b42aa3032de5d28c8c21e4ffbe8222b815
* exit if the first login passed
Change-Id: Ifefd5e9e0dd7d07b05cc493ab1f72415de557ec2
* Kafka Index Task that supports Incremental handoffs
- Incrementally handoff segments when they hit maxRowsPerSegment limit
- Decouple segment partitioning from Kafka partitioning, all records from consumed partitions go to a single druid segment
- Support for restoring task on middle manager restarts by check pointing end offsets for segments
* take care of review comments
* make getCurrentOffsets call async, keep track of publishing sequence, review comments
* fix setEndoffset duplicate request handling, formatting
* fix unit test
* backward compatibility
* make AppenderatorDriverMetadata backwards compatible
* add unit test
* fix deadlock between persist and push executors in AppenderatorImpl
* fix formatting
* use persist dir instead of work dir
* review comments
* fix deadlock
* actually fix deadlock
* Add compaction task
* added doc
* use combining aggregators
* address comments
* add support for dimensionsSpec
* fix getUniqueDims and getUniqueMetics
* find unique dimensionsSpec
* fix compilation
* add unit test
* fix test
* fix test
* test for different dimension orderings and types, and doc for type and ordering
* add control for custom ordering and type
* update doc
* fix compile
* fix compile
* add segments param
* fix serde error
* fix build
AppenderatorImpl already applies maxRowsInMemory across all sinks. So dividing by
the number of Kafka partitions is pointless and effectively makes the interpretation
of maxRowsInMemory lower than expected.
This undoes one of the two changes from #3284, which fixed the original bug twice.
In this, that's worse than fixing it once.
* Fix havingSpec on complex aggregators.
- Uses the technique from #4883 on DimFilterHavingSpec too.
- Also uses Transformers from #4890, necessitating a move of that and other
related classes from druid-server to druid-processing. They probably make
more sense there anyway.
- Adds a SQL query test.
Fixes#4957.
* Remove unused import.
* Introduce "transformSpec" at ingest-time.
It accepts a "filter" (standard query filter object) and "transforms" (a
list of objects with "name" and "expression"). These can be used to do
filtering and single-row transforms without need for a separate data
processing job.
The "expression" fields use the same expression language as other
expression-based feature.
* Remove forbidden api.
* Fix compile error.
* Fix tests.
* Some more changes.
- Add nullable annotation to Firehose.nextRow.
- Add tests for index task, realtime task, kafka task, hadoop mapper,
and ingestSegment firehose.
* Fix bad merge.
* Adjust imports.
* Adjust whitespace.
* Make Transform into an interface.
* Add missing annotation.
* Switch logger.
* Switch logger.
* Adjust test.
* Adjustment to handling for DatasourceIngestionSpec.
* Fix test.
* CR comments.
* Remove unused method.
* Add javadocs.
* More javadocs, and always decorate.
* Fix bug in TransformingStringInputRowParser.
* Fix bad merge.
* Fix ISFF tests.
* Fix DORC test.
* Changes for lookup synchronization
* Refactor of Lookup classes
* Minor refactors and doc update
* Change coordinator instance to be retrieved by DruidLeaderClient
* Wait before thread shutdown
* Make disablelookups flag true by default
* Update docs
* Rename flag
* Move executorservice shutdown to finally block
* Update LookupConfig
* Refactoring and doc changes
* Remove lookup config constructor
* Revert Lookupconfig constructor changes
* Add tests to LookupConfig
* Make executorservice local
* Update LRM
* Move ListeningScheduledExecutorService to ExecutorCompletionService
* Move exception to outer block
* Remove check to see future is done
* Remove unnecessary assignment
* Add logging
* SQL: Upgrade to Calcite 1.14.0, some refactoring of internals.
This brings benefits:
- Ability to do GROUP BY and ORDER BY with ordinals.
- Ability to support IN filters beyond 19 elements (fixes#4203).
Some refactoring of druid-sql internals:
- Builtin aggregators and operators are implemented as SqlAggregators
and SqlOperatorConversions rather being special cases. This simplifies
the Expressions and GroupByRules code, which were becoming complex.
- SqlAggregator implementations are no longer responsible for filtering.
Added new functions:
- Expressions: strpos.
- SQL: TRUNCATE, TRUNC, LENGTH, CHAR_LENGTH, STRLEN, STRPOS, SUBSTR,
and DATE_TRUNC.
* Add missing @Override annotation.
* Adjustments for forbidden APIs.
* Adjustments for forbidden APIs.
* Disable GROUP BY alias.
* Doc reword.
* adding new post aggregators of test stats to druid-stats extension
* changes to address code review comments
* fix checkstyle violations using druid_intellij_formatting.xml after merge upstream/master
* add @Override annotation per CI log
* make changes per review comments/discussions
* remove some blocks per review comments
* remove ServerConfig from DruidNode as all information needs to be present in DruidNode serialized form
* sanitize output of /druid/coordinator/v1/cluster endpoint
* Added org.joda.time.DateTime#(java.lang.String) to forbidden API.
* Added org.joda.time.DateTime#(java.lang.String, org.joda.time.format.DateTimeFormatter) to forbidden API.
* Add additional APIs that may create DateTime with default time zone
* Add helper function that accepts formatter to parse String.
* Add additional forbidden APIs
* Replace existing usage of forbidden APIs
* Use wrapper class to enforce Chronology on DateTimeFormatter.
* Creates constant UtcFormatter for constant ISODateTimeFormat.
* Add flattenSpec support to the Avro parser.
Also:
- Refactor the JSONPathParser a bit so it can share flattening code
with Avro (see ObjectFlatteners).
- Remove the JSONParser. It was only used in two places: by
UriNamespaceExtractor, and as a base for JSONToLowerParser. Migrated
the former to JSONPathParser and made the latter a standalone.
- Move GenericRecordAsMap to the Parquet extension, since the Avro
extension no longer uses it.
* Fix indentation.
* Fix equals/hashCode.
* Move caffeine out of extension.
* Remove `JsonTypeName` from the class itself
* Fix bad docs
* Fix distribution pom
* Fix unused import
* Make caffeine default
* Address code comments
* Add more description around the jre version in the readme
* Add suggested comments