* perf: provide a custom utf8 specific buffered line iterator (benchmark)
Benchmark Mode Cnt Score Error Units
JsonLineReaderBenchmark.baseline avgt 15 3459.871 ± 106.175 us/op
* perf: provide a custom utf8 specific buffered line iterator
Benchmark Mode Cnt Score Error Units
JsonLineReaderBenchmark.baseline avgt 15 3022.053 ± 51.286 us/op
* perf: provide a custom utf8 specific buffered line iterator (more tests)
* perf: provide a custom utf8 specific buffered line iterator (pr feedback)
Ensure field visibility is as limited as possible
Null check for buffer in constructor
* perf: provide a custom utf8 specific buffered line iterator (pr feedback)
Remove additional 'finished' variable.
* perf: provide a custom utf8 specific buffered line iterator (more tests and bugfix)
Druid catalog basics
Catalog object model for tables, columns
Druid metadata DB storage (as an extension)
REST API to update the catalog (as an extension)
Integration tests
Model only: no planner integration yet
* Support for middle manager less druid, tasks launch as k8s jobs
* Fixing forking task runner test
* Test cleanup, dependency cleanup, intellij inspections cleanup
* Changes per PR review
Add configuration option to disable http/https proxy for the k8s client
Update the docs to provide more detail about sidecar support
* Removing un-needed log lines
* Small changes per PR review
* Upon task completion we callback to the overlord to update the status / locaiton, for slower k8s clusters, this reduces locking time significantly
* Merge conflict fix
* Fixing tests and docs
* update tiny-cluster.yaml
changed `enableTaskLevelLogPush` to `encapsulatedTask`
* Apply suggestions from code review
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
* Minor changes per PR request
* Cleanup, adding test to AbstractTask
* Add comment in peon.sh
* Bumping code coverage
* More tests to make code coverage happy
* Doh a duplicate dependnecy
* Integration test setup is weird for k8s, will do this in a different PR
* Reverting back all integration test changes, will do in anotbher PR
* use StringUtils.base64 instead of Base64
* Jdk is nasty, if i compress in jdk 11 in jdk 17 the decompressed result is different
Co-authored-by: Rahul Gidwani <r_gidwani@apple.com>
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
* Update Snappy to 1.1.8.4.
Prior to this, because snappy-java wasn't included in dependencyManagement,
we actually shipped multiple different versions for different extensions,
ranging from 1.1.7.1 to 1.1.8.4. Now, we standardize on 1.1.8.4.
Among other things, this enables the tests to pass on M1 Macs.
* Update snappy-java versions in licenses.yaml.
* Add interpolation to JsonConfigurator
* Fix checkstyle
* Fix tests by removing common-text override
* Add back commons-text without version
* Remove unused hadoopDir configs
* Move some stuff to hopefully pass coverage
Compressed Big Decimal is an extension which provides support for
Mutable big decimal value that can be used to accumulate values
without losing precision or reallocating memory. This type helps in
absolute precision arithmetic on large numbers in applications,
where greater level of accuracy is required, such as financial
applications, currency based transactions. This helps avoid rounding
issues where in potentially large amount of money can be lost.
Accumulation requires that the two numbers have the same scale,
but does not require that they are of the same size. If the value
being accumulated has a larger underlying array than this value
(the result), then the higher order bits are dropped, similar to what
happens when adding a long to an int and storing the result in an
int. A compressed big decimal that holds its data with an embedded
array.
Compressed big decimal is an absolute number based complex type
based on big decimal in Java. This supports all the functionalities
supported by Java Big Decimal. Java Big Decimal is not mutable in
order to avoid big garbage collection issues. Compressed big decimal
is needed to mutate the value in the accumulator.
The docs say Java 17 support is experimental, and give tips on running
successfully with Java 17.
This patch also removes java.base/jdk.internal.perf and
jdk.management/com.sun.management.internal from the list of required
exports and opens, because they were formerly needed for JvmMonitor,
which was rewritten in #12481 to use MXBeans instead.
This dependency was no longer needed after #12481, but remained because
it was used for a (now useless) test. This patch removes the test and
the dependency.
This commit is a first draft of the revised integration test framework which provides:
- A new directory, integration-tests-ex that holds the new integration test structure. (For now, the existing integration-tests is left unchanged.)
- Maven module druid-it-tools to hold code placed into the Docker image.
- Maven module druid-it-image to build the Druid-only test image from the tarball produced in distribution. (Dependencies live in their "official" image.)
- Maven module druid-it-cases that holds the revised tests and the framework itself. The framework includes file-based test configuration, test-specific clients, test initialization and updated versions of some of the common test support classes.
The integration test setup is primarily a huge mass of details. This approach refactors many of those details: from how the image is built and configured to how the Docker Compose scripts are structured to test configuration. An extensive set of "readme" files explains those details. Rather than repeat that material here, please consult those files for explanations.
* Improved Java 17 support and Java runtime docs.
1) Add a "Java runtime" doc page with information about supported
Java versions, garbage collection, and strong encapsulation..
2) Update asm and equalsverifier to versions that support Java 17.
3) Add additional "--add-opens" lines to surefire configuration, so
tests can pass successfully under Java 17.
4) Switch openjdk15 tests to openjdk17.
5) Update FrameFile to specifically mention Java runtime incompatibility
as the cause of not being able to use Memory.map.
6) Update SegmentLoadDropHandler to log an error for Errors too, not
just Exceptions. This is important because an IllegalAccessError is
encountered when the correct "--add-opens" line is not provided,
which would otherwise be silently ignored.
7) Update example configs to use druid.indexer.runner.javaOptsArray
instead of druid.indexer.runner.javaOpts. (The latter is deprecated.)
* Adjustments.
* Use run-java in more places.
* Add run-java.
* Update .gitignore.
* Exclude hadoop-client-api.
Brought in when building on Java 17.
* Swap one more usage of java.
* Fix the run-java script.
* Fix flag.
* Include link to Temurin.
* Spelling.
* Update examples/bin/run-java
Co-authored-by: Xavier Léauté <xl+github@xvrl.net>
Co-authored-by: Xavier Léauté <xl+github@xvrl.net>
* Frame format for data transfer and short-term storage.
As we move towards query execution plans that involve more transfer
of data between servers, it's important to have a data format that
provides for doing this more efficiently than the options available to
us today.
This patch adds:
- Columnar frames, which support fast querying.
- Row-based frames, which support fast sorting via memory comparison
and fast whole-row copies via memory copying.
- Frame files, a container format that can be stored on disk or
transferred between servers.
The idea is we should use row-based frames when data is expected to
be sorted, and columnar frames when data is expected to be queried.
The code in this patch is not used in production yet. Therefore, the
patch involves minimal changes outside of the org.apache.druid.frame
package. The main ones are adjustments to SqlBenchmark to add benchmarks
for queries on frames, and the addition of a "forEach" method to Sequence.
* Fixes based on tests, static analysis.
* Additional fixes.
* Skip DS mapping tests on JDK 14+
* Better JDK checking in tests.
* Fix imports.
* Additional comment.
* Adjustments from code review.
* Update test case.
* jvm gc to mxbeans
* add zgc and shenandoah #12476
* remove tryCreateGcCounter
* separate the space collector
* blend GcGenerationCollector into GcCollector
* add jdk surefire argLine
* Poison StupidPool and fix resource leaks
There are various resource leaks from test setup as well as some
corners in query processing. We poison the StupidPool to start failing
tests when the leaks come and fix any issues uncovered from that so
that we can start from a clean baseline.
Unfortunately, because of how poisoning works,
we can only fail future checkouts from the same pool,
which means that there is a natural race between a
leak happening -> GC occurs -> leak detected -> pool poisoned.
This race means that, depending on interleaving of tests,
if the very last time that an object is checked out
from the pool leaks, then it won't get caught.
At some point in the future, something will catch it,
however and from that point on it will be deterministic.
* Remove various things left over from iterations
* Clean up FilterAnalysis and add javadoc on StupidPool
* Revert changes to .idea/misc.xml that accidentally got pushed
* Style and test branches
* Stylistic woes
This commit contains changes made to the existing ITs to support the new ITs.
Changes:
- Make the "custom node role" code usable by the new ITs.
- Use flag `-DskipITs` to skips the integration tests but runs unit tests.
- Use flag `-DskipUTs` skips unit tests but runs the "new" integration tests.
- Expand the existing Druid profile, `-P skip-tests` to skip both ITs and UTs.
* Adding zstandard compression library
* 1. Took @clintropolis's advice to have ZStandard decompressor use the byte array when the buffers are not direct.
2. Cleaned up checkstyle issues.
* Fixing zstandard version to latest stable version in pom's and updating license files
* Removing zstd from benchmarks and adding to processing (poms)
* fix the intellij inspection issue
* Removing the prefix v for the version in the license check for ztsd
* Fixing license checks
Co-authored-by: Rahul Gidwani <r_gidwani@apple.com>
* Ensure ByteBuffers allocated in tests get freed.
Many tests had problems where a direct ByteBuffer would be allocated
and then not freed. This is bad because it causes flaky tests.
To fix this:
1) Add ByteBufferUtils.allocateDirect(size), which returns a ResourceHolder.
This makes it easy to free the direct buffer. Currently, it's only used
in tests, because production code seems OK.
2) Update all usages of ByteBuffer.allocateDirect (off-heap) in tests either
to ByteBuffer.allocate (on-heap, which are garbaged collected), or to
ByteBufferUtils.allocateDirect (wherever it seemed like there was a good
reason for the buffer to be off-heap). Make sure to close all direct
holders when done.
* Changes based on CI results.
* A different approach.
* Roll back BitmapOperationTest stuff.
* Try additional surefire memory.
* Revert "Roll back BitmapOperationTest stuff."
This reverts commit 49f846d9e3.
* Add TestBufferPool.
* Revert Xmx change in tests.
* Better behaved NestedQueryPushDownTest. Exit tests on OOME.
* Fix TestBufferPool.
* Remove T1C from ARM tests.
* Somewhat safer.
* Fix tests.
* Fix style stuff.
* Additional debugging.
* Reset null / expr configs better.
* ExpressionLambdaAggregatorFactory thread-safety.
* Alter forkNode to try to get better info when a JVM crashes.
* Fix buffer retention in ExpressionLambdaAggregatorFactory.
* Remove unused import.
* Add ipaddress library as dependency.
* IPv4 functions to use the inet.ipaddr package.
* Remove unused imports.
* Add new function.
* Minor rename.
* Add more unit tests.
* IPv4 address expr utils unit tests and address options.
* Adjust the IPv4Util functions.
* Move the UTs a bit around.
* Javadoc comments.
* Add license info for IPAddress.
* Fix groupId, artifact and version in license.yaml.
* Remove redundant subnet in messages - fixes UT.
* Remove unused commons-net dependency for /processing project.
* Make class and methods public so it can be accessed.
* Add initial version of benchmark
* Add subnetutils package for benchmarks.
* Auto generate ip addresses.
* Add more v4 address representations in setup to avoid bias.
* Use ThreadLocalRandom to avoid forbidden API usage.
* Adjust IPv4AddressBenchmark to adhere to codestyle rules.
* Update ipaddress library to latest 5.3.4
* Add ipaddress package dependency to benchmarks project.
* JvmMonitor: Handle more generation and collector scenarios.
ZGC on Java 11 only has a generation 1 (there is no 0). This causes
a NullPointerException when trying to extract the spacesCount for
generation 0. In addition, ZGC on Java 15 has a collector number 2
but no spaces in generation 2, which breaks the assumption that
collectors always have same-numbered spaces.
This patch adjusts things to be more robust, enabling the JvmMonitor
to work properly for ZGC on both Java 11 and 15.
* Test adjustments.
* Improve surefire arglines.
* Need a placeholder
The latest version of Error Prone now requires Java 11. Upgrading means we can
remove a lot of the maven profile complexity required to run checks with Java 8.
This also requires switching our strict build to use Java 11.
* update error-prone to 2.11
* remove need for specific maven profiles for Java 8 and Java 15
* fix additional Error Prone warnings with Java 11
* update strict build to use Java 11
These changes are to use the latest datasketches-java-3.1.0 and also to restore support for quantile and HLL4 sketches to be able to grow larger than a given buffer in a buffer aggregator and move to heap in rare cases. This was discussed in #11544.
Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>
Mockito now supports all our needs and plays much better with recent Java versions.
Migrating to Mockito also simplifies running the kind of tests that required PowerMock in the past.
* replace all uses of powermock with mockito-inline
* upgrade mockito to 4.3.1 and fix use of deprecated methods
* import mockito bom to align all our mockito dependencies
* add powermock to forbidden-apis to avoid accidentally reintroducing it in the future
* upgrade Airline to Airline 2
https://github.com/airlift/airline is no longer maintained, updating to
https://github.com/rvesse/airline (Airline 2) to use an actively
maintained version, while minimizing breaking changes.
Note, this is a backwards incompatible change, and extensions relying on
the CliCommandCreator extension point will also need to be updated.
* fix dependency checks where jakarta.inject is now resolved first instead
of javax.inject, due to Airline 2 using jakarta
* Remove use of deprecated PMD ruleset
This fixes annoying warnings we were getting during build.
- Use a custom PMD ruleset, since the built-in one uses deprecated rules.
- UnnecessaryImport replaces most of the deprecated rules
- Update maven-pmd-plugin to 3.15
- Exclude ancient asm version from caliper, since this was causing
incompatibility warnings with PMD and could also affect our tests runs
in unexpected ways
* replace deprecated forbidden-apis config failOnUnresolvableSignatures
with ignoreSignaturesOfMissingClasses which avoids warnings for
classes not present in a particular sub-module
* fix incorrect signature for Files.createTempDirectory
* complex typed expressions
* add built-in hll collector expressions to get coverage on druid-processing, more types, more better
* rampage!!!
* more javadoc
* adjustments
* oops
* lol
* remove unused dependency
* contradiction?
* more test
Add support for hadoop 3 profiles . Most of the details are captured in #11791 .
We use a combination of maven profiles and resource filtering to achieve this. Hadoop2 is supported by default and a new maven profile with the name hadoop3 is created. This will allow the user to choose the profile which is best suited for the use case.
* latest datasketches-java and datasketches-memory
* updated versions of datasketches-java and datasketches-memory
Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>
* support using mariadb connector with mysql extensions
* cleanup and more tests
* fix test
* javadocs, more tests, etc
* style and more test
* more test more better
* missing pom
* more pom
Switching to the bom dependency declaration simplifies managing jackson
dependencies. It also removes the need to override individual library
versions for CVE fixes, since the bom takes care of that internally.
This change aligns our jackson dependency versions on 2.10.5(.x):
- updates jackson libraries from 2.10.2 to 2.10.5
- jackson-databind remains at 2.10.5.1 as defined in the bom
Release notes: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.10
* upgrade error-prone to 2.7.1 and support checks with Java 11+
- upgrade error-prone to 2.7.1
- support running error-prone with Java 11 and above using -Xplugin
instead of custom compiler
- add compiler arguments to ignore warnings/errors in Java 15/16
- introduce strictCompile property to enable strict profiles since we
now need multiple strict profiles for Java 8
- properly exclude all generated source files from error-prone
- fix druid-processing overriding annotation processors from parent pom
- fix druid-core disabling most non-default checks
- align plugin and annotation errorprone versions
- fix / suppress additional issues found by error-prone:
* fix bug in SeekableStreamSupervisor initializing ArrayList size with
the taskGroupdId
* fix missing @Override annotations
- remove outdated compiler plugin in benchmarks
- remove deleted ParameterPackage error-prone rule
- re-enable checks on benchmark module as well
* fix IntelliJ inspections
* disable LongFloatConversion due to bug in error-prone with JDK 8
* add comment about InsecureCrypto
With this change, Druid will only support ZooKeeper 3.5.x and later.
In order to support Java 15 we need to switch to ZK 3.5.x client libraries and drop support for ZK 3.4.x
(see #10780 for the detailed reasons)
* remove ZooKeeper 3.4.x compatibility
* exclude additional ZK 3.5.x netty dependencies to ensure we use our version
* keep ZooKeeper version used for integration tests in sync with client library version
* remove the need to specify ZK version at runtime for docker
* add support to run integration tests with JDK 15
* build and run unit tests with Java 15 in travis
* upgrade to Apache Kafka 2.8.0 (release notes:
https://downloads.apache.org/kafka/2.8.0/RELEASE_NOTES.html)
* pass Kafka version as a Docker argument in integration tests
to keep in sync with maven version
* fix use of internal Kafka APIs in integration tests
* Update some dev dependencies, prettify, tslint-fix
* Sort tsconfig keys for easy comparison
* Set noImplicitThis
* Slightly more accurate types
* Bump Jest and related
* Bump react to latest on v16
* Bump node-sass, sass-loader for node14 support
* Remove node-sass-chokidar (unused)
* More unused dependencies
* Fix blueprint imports
* Webpack 5
* Update webpack config for 'process' usage
* Update playwright-chromium
* Emit esnext modules for tree shaking
* Enable source maps in development
* Dedupe
* Bump babel and things
* npm audit fix
* Add .editorconfig file to match prettier settings
* Update licenses (tslib is 0BSD as of 1.11.2)
https://github.com/microsoft/tslib/pull/96
* Require node >= 10
* Use Node 10 to run e2e tests
* Use 'ws' transport mode for dev server (will be default in next version)
* Remove an 'any'
* No sourcemaps in prod
* Exclude .editorconfig from license checks
* Try nvm for setting node version