* Frame processing and channels.
Follow-up to #12745. This patch adds three new concepts:
1) Frame channels are interfaces for doing nonblocking reads and writes
of frames.
2) Frame processors are interfaces for doing nonblocking processing of
frames received from input channels and sent to output channels.
3) Cluster-by keys, which can be used for sorting or partitioning.
The patch also adds SuperSorter, a user of these concepts, both to
illustrate how they are used, and also because it is going to be useful
in future work.
Central classes:
- ReadableFrameChannel. Implementations include
BlockingQueueFrameChannel (in-memory channel that implements both interfaces),
ReadableFileFrameChannel (file-based channel),
ReadableByteChunksFrameChannel (byte-stream-based channel), and others.
- WritableFrameChannel. Implementations include BlockingQueueFrameChannel
and WritableStreamFrameChannel (byte-stream-based channel).
- ClusterBy, a sorting or partitioning key.
- FrameProcessor, nonblocking processor of frames. Implementations include
FrameChannelBatcher, FrameChannelMerger, and FrameChannelMuxer.
- FrameProcessorExecutor, an executor service that runs FrameProcessors.
- SuperSorter, a class that uses frame channels and processors to
do parallel external merge sort of any amount of data (as long as there
is enough disk space).
* Additional tests, fixes.
* Changes from review.
* Better implementation for ReadableInputStreamFrameChannel.
* Rename getFrameFileReference -> newFrameFileReference.
* Add InterruptedException to runIncrementally; add more tests.
* Cancellation adjustments.
* Review adjustments.
* Refactor BlockingQueueFrameChannel, rename doneReading and doneWriting to close.
* Additional changes from review.
* Additional changes.
* Fix test.
* Adjustments.
* Adjustments.
* Frame format for data transfer and short-term storage.
As we move towards query execution plans that involve more transfer
of data between servers, it's important to have a data format that
provides for doing this more efficiently than the options available to
us today.
This patch adds:
- Columnar frames, which support fast querying.
- Row-based frames, which support fast sorting via memory comparison
and fast whole-row copies via memory copying.
- Frame files, a container format that can be stored on disk or
transferred between servers.
The idea is we should use row-based frames when data is expected to
be sorted, and columnar frames when data is expected to be queried.
The code in this patch is not used in production yet. Therefore, the
patch involves minimal changes outside of the org.apache.druid.frame
package. The main ones are adjustments to SqlBenchmark to add benchmarks
for queries on frames, and the addition of a "forEach" method to Sequence.
* Fixes based on tests, static analysis.
* Additional fixes.
* Skip DS mapping tests on JDK 14+
* Better JDK checking in tests.
* Fix imports.
* Additional comment.
* Adjustments from code review.
* Update test case.
* jvm gc to mxbeans
* add zgc and shenandoah #12476
* remove tryCreateGcCounter
* separate the space collector
* blend GcGenerationCollector into GcCollector
* add jdk surefire argLine
* Poison StupidPool and fix resource leaks
There are various resource leaks from test setup as well as some
corners in query processing. We poison the StupidPool to start failing
tests when the leaks come and fix any issues uncovered from that so
that we can start from a clean baseline.
Unfortunately, because of how poisoning works,
we can only fail future checkouts from the same pool,
which means that there is a natural race between a
leak happening -> GC occurs -> leak detected -> pool poisoned.
This race means that, depending on interleaving of tests,
if the very last time that an object is checked out
from the pool leaks, then it won't get caught.
At some point in the future, something will catch it,
however and from that point on it will be deterministic.
* Remove various things left over from iterations
* Clean up FilterAnalysis and add javadoc on StupidPool
* Revert changes to .idea/misc.xml that accidentally got pushed
* Style and test branches
* Stylistic woes
* Adding zstandard compression library
* 1. Took @clintropolis's advice to have ZStandard decompressor use the byte array when the buffers are not direct.
2. Cleaned up checkstyle issues.
* Fixing zstandard version to latest stable version in pom's and updating license files
* Removing zstd from benchmarks and adding to processing (poms)
* fix the intellij inspection issue
* Removing the prefix v for the version in the license check for ztsd
* Fixing license checks
Co-authored-by: Rahul Gidwani <r_gidwani@apple.com>
* Ensure ByteBuffers allocated in tests get freed.
Many tests had problems where a direct ByteBuffer would be allocated
and then not freed. This is bad because it causes flaky tests.
To fix this:
1) Add ByteBufferUtils.allocateDirect(size), which returns a ResourceHolder.
This makes it easy to free the direct buffer. Currently, it's only used
in tests, because production code seems OK.
2) Update all usages of ByteBuffer.allocateDirect (off-heap) in tests either
to ByteBuffer.allocate (on-heap, which are garbaged collected), or to
ByteBufferUtils.allocateDirect (wherever it seemed like there was a good
reason for the buffer to be off-heap). Make sure to close all direct
holders when done.
* Changes based on CI results.
* A different approach.
* Roll back BitmapOperationTest stuff.
* Try additional surefire memory.
* Revert "Roll back BitmapOperationTest stuff."
This reverts commit 49f846d9e3.
* Add TestBufferPool.
* Revert Xmx change in tests.
* Better behaved NestedQueryPushDownTest. Exit tests on OOME.
* Fix TestBufferPool.
* Remove T1C from ARM tests.
* Somewhat safer.
* Fix tests.
* Fix style stuff.
* Additional debugging.
* Reset null / expr configs better.
* ExpressionLambdaAggregatorFactory thread-safety.
* Alter forkNode to try to get better info when a JVM crashes.
* Fix buffer retention in ExpressionLambdaAggregatorFactory.
* Remove unused import.
* Add ipaddress library as dependency.
* IPv4 functions to use the inet.ipaddr package.
* Remove unused imports.
* Add new function.
* Minor rename.
* Add more unit tests.
* IPv4 address expr utils unit tests and address options.
* Adjust the IPv4Util functions.
* Move the UTs a bit around.
* Javadoc comments.
* Add license info for IPAddress.
* Fix groupId, artifact and version in license.yaml.
* Remove redundant subnet in messages - fixes UT.
* Remove unused commons-net dependency for /processing project.
* Make class and methods public so it can be accessed.
* Add initial version of benchmark
* Add subnetutils package for benchmarks.
* Auto generate ip addresses.
* Add more v4 address representations in setup to avoid bias.
* Use ThreadLocalRandom to avoid forbidden API usage.
* Adjust IPv4AddressBenchmark to adhere to codestyle rules.
* Update ipaddress library to latest 5.3.4
* Add ipaddress package dependency to benchmarks project.
* JvmMonitor: Handle more generation and collector scenarios.
ZGC on Java 11 only has a generation 1 (there is no 0). This causes
a NullPointerException when trying to extract the spacesCount for
generation 0. In addition, ZGC on Java 15 has a collector number 2
but no spaces in generation 2, which breaks the assumption that
collectors always have same-numbered spaces.
This patch adjusts things to be more robust, enabling the JvmMonitor
to work properly for ZGC on both Java 11 and 15.
* Test adjustments.
* Improve surefire arglines.
* Need a placeholder
* upgrade Airline to Airline 2
https://github.com/airlift/airline is no longer maintained, updating to
https://github.com/rvesse/airline (Airline 2) to use an actively
maintained version, while minimizing breaking changes.
Note, this is a backwards incompatible change, and extensions relying on
the CliCommandCreator extension point will also need to be updated.
* fix dependency checks where jakarta.inject is now resolved first instead
of javax.inject, due to Airline 2 using jakarta
* upgrade error-prone to 2.7.1 and support checks with Java 11+
- upgrade error-prone to 2.7.1
- support running error-prone with Java 11 and above using -Xplugin
instead of custom compiler
- add compiler arguments to ignore warnings/errors in Java 15/16
- introduce strictCompile property to enable strict profiles since we
now need multiple strict profiles for Java 8
- properly exclude all generated source files from error-prone
- fix druid-processing overriding annotation processors from parent pom
- fix druid-core disabling most non-default checks
- align plugin and annotation errorprone versions
- fix / suppress additional issues found by error-prone:
* fix bug in SeekableStreamSupervisor initializing ArrayList size with
the taskGroupdId
* fix missing @Override annotations
- remove outdated compiler plugin in benchmarks
- remove deleted ParameterPackage error-prone rule
- re-enable checks on benchmark module as well
* fix IntelliJ inspections
* disable LongFloatConversion due to bug in error-prone with JDK 8
* add comment about InsecureCrypto
* Filter http requests by http method
Add a config that allows a user which http methods to allow against their
Druid server.
Druid will only accept http requests with the method: GET, PUT, POST, DELETE
and OPTIONS.
If a Druid admin wants to allow other methods, they can do so by using the
ServerConfig#allowedHttpMethods config.
If a Druid user would like to disallow OPTIONS, this can be done by changing
the AuthConfig#allowUnauthenticatedHttpOptions config
* Exclude OPTIONS from always supported HTTP methods
Add HEAD as an allowed method for web console e2e tests
* fix docs
* fix security IT
* Actually fix the web console e2e tests
* Ignore icode coverage for nitialization classes
* code review
* move benchmark data generator into druid-processing, add a GeneratorInputSource to fill up a cluster with data
* newlines
* make test coverage not fail maybe
* remove useless test
* Update pom.xml
* Update GeneratorInputSourceTest.java
* less passive aggressive test names
* Fix potential NPEs in joins
intelliJ reported issues with potential NPEs. This was first hit in testing
with a filter being pushed down to the left hand table when joining against
an indexed table.
* More null check cleanup
* Optimize filter value rewrite for IndexedTable
* Add unit tests for LookupJoinable
* Add tests for IndexedTableJoinable
* Add non null assert for dimension selector
* Supress null warning in LookupJoinMatcher
* remove some null checks on hot path
* Forbid easily misused HashSet and HashMap constructors
* Add two LinkedHashMap constructors to forbidden-apis and create utility method as replacement for them
* Fix visibility of constant in CollectionUtils.java
* Make an exception for an instance of LinkedHashMap#<init>(int) because proper sizing is used
* revert changes to sql module tests that should be in separate PR
* Finish reverting changes to sql module tests that were flagged in checkstyle during CI
* Add netty dependency resulting from SupressForbidden
* Add MemoryOpenHashTable, a table similar to ByteBufferHashTable.
With some key differences to improve speed and design simplicity:
1) Uses Memory rather than ByteBuffer for its backing storage.
2) Uses faster hashing and comparison routines (see HashTableUtils).
3) Capacity is always a power of two, allowing simpler design and more
efficient implementation of findBucket.
4) Does not implement growability; instead, leaves that to its callers.
The idea is this removes the need for subclasses, while still giving
callers flexibility in how to handle table-full scenarios.
* Fix LGTM warnings.
* Adjust dependencies.
* Remove easymock from druid-benchmarks.
* Adjustments from review.
* Fix datasketches unit tests.
* Fix checkstyle.
* Add HashJoinSegment, a virtual segment for joins.
An initial step towards #8728. This patch adds enough functionality to implement a joining
cursor on top of a normal datasource. It does not include enough to actually do a query. For
that, future patches will need to wire this low-level functionality into the query language.
* Fixups.
* Fix missing format argument.
* Various tests and minor improvements.
* Changes.
* Remove or add tests for unused stuff.
* Fix up package locations.
* Fix dependency analyze warnings
Update the maven dependency plugin to the latest version and fix all
warnings for unused declared and used undeclared dependencies in the
compile scope. Added new travis job to add the check to CI. Also fixed
some source code files to use the correct packages for their imports and
updated druid-forbidden-apis to prevent regressions.
* Address review comments
* Adjust scope for org.glassfish.jaxb:jaxb-runtime
* Fix dependencies for hdfs-storage
* Consolidate netty4 versions
* Enable code coverage
Code coverage was disabled via
https://github.com/apache/incubator-druid/pull/3122 due to an issue with
cobertura in Travis CI. Switch code coverage tool from cobertura to
jacoco to avoid issue and re-enable coveralls for Travis CI.
* Exclude non-production code
* Exclude benchmark generated code
* Exclude DruidTestRunnerFactory
* Add IPv4 druid expressions
New druid expressions for filtering IPv4 addresses:
- ipv4address_match: Check if IP address belongs to a subnet
- ipv4address_parse: Convert string IP address to long
- ipv4address_stringify: Convert long IP address to string
These expressions operate on IP addresses represented as either strings
or longs, so that they can be applied to dimensions with mixed
representation of IP addresses. The filtering is more efficient when
operating on IP addresses as longs. In other words, the intended use
case is:
1) Use ipv4address_parse to convert to long at ingestion time
2) Use ipv4address_match to filter (on longs) at query time
3) Use ipv4adress_stringify to convert to (readable) string at query
time
* Fix licenses and null handling
* Simplify IPv4 expressions
* Fix tests
* Fix check for valid ipv4 address string
* Fix dependency analyze warnings
Update the maven dependency plugin to the latest version and fix all
warnings for unused declared and used undeclared dependencies in the
compile scope. Added new travis job to add the check to CI. Also fixed
some source code files to use the correct packages for their imports.
* Fix licenses and dependencies
* Fix licenses and dependencies again
* Fix integration test dependency
* Address review comments
* Fix unit test dependencies
* Fix integration test dependency
* Fix integration test dependency again
* Fix integration test dependency third time
* Fix integration test dependency fourth time
* Fix compile error
* Fix assert package
* Double-checked locking bug is fixed.
* @Nullable is removed since there is no need to use along with @MonotonicNonNull.
* Static import is removed.
* Lazy initialization is implemented.
* Local variables used instead of volatile ones.
* Local variables used instead of volatile ones.
* Fix travis timeout in BufferHashGrouperTest
* adjust buffer size
* adjust bufferSize and loadFactor
* increase memory
* add debug code
* cat error
* after script
* print logs
* print per 2 min
* use direct mem
* clean up