* Add "offset" parameter to GroupBy query.
It works by doing the query as normal and then throwing away the first
"offset" number of rows on the broker.
* Stabilize GroupBy sorts.
* Fix inspections.
* Fix suppression.
* Fixups.
* Move TopNSequence to druid-core.
* Addl comments.
* NumberedElement equals verification.
* Changes from review.
* Fix minor formatting in docs.
* Add Nullhandling initialization for test to run from IDE.
* Vectorize longMin aggregator.
- A new vectorized class for the vectorized long min aggregator.
- Changes to AggregatorFactory to support vectorize functionality.
- Few changes to schema evolution test to add LongMinAggregatorFactory.
* Add longSum to the supported vectorized aggregator implementations.
* Add MIN() long min to calcite query test that can vectorize.
* Add simple long aggregations test.
* Fixup formatting per checkstyle guide.
* fixup and add more tests for long min aggregator.
* Override test for groupBy since timestamps are handled differently.
* Null compatibility check in test.
* Review comment: Add a test case to LongMinAggregationTest.
* support unit suffix on byte-related properties
* add doc
* change default value of byte-related properites in example files
* fix coding style
* fix doc
* fix CI
* suppress spelling errors
* improve code according to comments
* rename Bytes to HumanReadableBytes
* add getBytesInInt to get value safely
* improve doc
* fix problem reported by CI
* fix problem reported by CI
* resolve code review comments
* improve error message
* improve code & doc according to comments
* fix CI problem
* improve doc
* suppress spelling check errors
* Add segment pruning for hash based partitioning
* Update doc
* Add additional test
* Address comments
* Fix unit test failure
Co-authored-by: Jian Wang <jwang@pinterest.com>
* Add availability and consistency docs.
Describes transactional ingestion and atomic replacement. Also, this patch
deletes some bad advice from the javadocs for SegmentTransactionalInsertAction.
* Fix missing word.
* init commit, all tests passed
* fix format
Signed-off-by: frank chen <frank.chen021@outlook.com>
* data stored successfully
* modify config path
* add doc
* add aliyun-oss extension to project
* remove descriptor deletion code to avoid warning message output by aliyun client
* fix warnings reported by lgtm-com
* fix ci warnings
Signed-off-by: frank chen <frank.chen021@outlook.com>
* fix errors reported by intellj inspection check
Signed-off-by: frank chen <frank.chen021@outlook.com>
* fix doc spelling check
Signed-off-by: frank chen <frank.chen021@outlook.com>
* fix dependency warnings reported by ci
Signed-off-by: frank chen <frank.chen021@outlook.com>
* fix warnings reported by CI
Signed-off-by: frank chen <frank.chen021@outlook.com>
* add package configuration to support showing extension info
Signed-off-by: frank chen <frank.chen021@outlook.com>
* add IT test cases and fix bugs
Signed-off-by: frank chen <frank.chen021@outlook.com>
* 1. code review comments adopted
2. change schema from 'aliyun-oss' to 'oss'
Signed-off-by: frank chen <frank.chen021@outlook.com>
* add license info
Signed-off-by: frank chen <frank.chen021@outlook.com>
* fix doc
Signed-off-by: frank chen <frank.chen021@outlook.com>
* exclude execution of IT testcases of OSS extension from CI
Signed-off-by: frank chen <frank.chen021@outlook.com>
* put the extensions under contrib group and add to distribution
* fix names in test cases
* add unit test to cover OssInputSource
* fix names in test cases
* fix dependency problem reported by CI
Signed-off-by: frank chen <frank.chen021@outlook.com>
* Filter http requests by http method
Add a config that allows a user which http methods to allow against their
Druid server.
Druid will only accept http requests with the method: GET, PUT, POST, DELETE
and OPTIONS.
If a Druid admin wants to allow other methods, they can do so by using the
ServerConfig#allowedHttpMethods config.
If a Druid user would like to disallow OPTIONS, this can be done by changing
the AuthConfig#allowUnauthenticatedHttpOptions config
* Exclude OPTIONS from always supported HTTP methods
Add HEAD as an allowed method for web console e2e tests
* fix docs
* fix security IT
* Actually fix the web console e2e tests
* Ignore icode coverage for nitialization classes
* code review
* Fill in the core partition set size properly for batch ingestion with
dynamic partitioning
* incomplete javadoc
* Address comments
* fix tests
* fix json serde, add tests
* checkstyle
* Set core partition set size for hash-partitioned segments properly in
batch ingestion
* test for both parallel and single-threaded task
* unused variables
* fix test
* unused imports
* add hash/range buckets
* some test adjustment and missing json serde
* centralized partition id allocation in parallel and simple tasks
* remove string partition chunk
* revive string partition chunk
* fill numCorePartitions for hadoop
* clean up hash stuffs
* resolved todos
* javadocs
* Fix tests
* add more tests
* doc
* unused imports
* Allow append to existing datasources when dynamic partitioing is used
* fix test
* checkstyle
* checkstyle
* fix test
* fix test
* fix other tests..
* checkstyle
* hansle unknown core partitions size in overlord segment allocation
* fail to append when numCorePartitions is unknown
* log
* fix comment; rename to be more intuitive
* double append test
* cleanup complete(); add tests
* fix build
* add tests
* address comments
* checkstyle
* Druid user permissions apply in the console
* Update index.md
* noting user warning in console page; some minor shuffling
* noting user warning in console page; some minor shuffling 1
* touchups
* link checking fixes
* Updated per suggestions
* change default number of segment loading threads
* fix docs
* missed file
* min -> max for segment loading threads
Co-authored-by: Dylan <dwylie@spotx.tv>
* fix docs error: google to azure and hdfs to http
* fix docs error: indexSpecForIntermediatePersists of tuningConfig in hadoop-based batch part
* fix docs error: logParseExceptions of tuningConfig in hadoop-based batch part
* fix docs error: maxParseExceptions of tuningConfig in hadoop-based batch part
* Fill in the core partition set size properly for batch ingestion with
dynamic partitioning
* incomplete javadoc
* Address comments
* fix tests
* fix json serde, add tests
* checkstyle
* Set core partition set size for hash-partitioned segments properly in
batch ingestion
* test for both parallel and single-threaded task
* unused variables
* fix test
* unused imports
* add hash/range buckets
* some test adjustment and missing json serde
* centralized partition id allocation in parallel and simple tasks
* remove string partition chunk
* revive string partition chunk
* fill numCorePartitions for hadoop
* clean up hash stuffs
* resolved todos
* javadocs
* Fix tests
* add more tests
* doc
* unused imports
* API to verify a datasource has the latest ingested data
* API to verify a datasource has the latest ingested data
* API to verify a datasource has the latest ingested data
* API to verify a datasource has the latest ingested data
* API to verify a datasource has the latest ingested data
* fix checksyle
* API to verify a datasource has the latest ingested data
* API to verify a datasource has the latest ingested data
* API to verify a datasource has the latest ingested data
* API to verify a datasource has the latest ingested data
* fix spelling
* address comments
* fix checkstyle
* update docs
* fix tests
* fix doc
* address comments
* fix typo
* fix spelling
* address comments
* address comments
* fix typo in docs
* ROUND and having comparators correctly handle doubles
Double.NaN, Double.POSITIVE_INFINITY and Double.NEGATIVE_INFINITY are not real
numbers. Because of this, they can not be converted to BigDecimal and instead
throw a NumberFormatException.
This change adds support for calculations that produce these numbers either
for use in the `ROUND` function or the HavingSpecMetricComparator by not
attempting to convert the number to a BigDecimal.
The bug in ROUND was first introduced in #7224 where we added the ability to
round to any decimal place. This PR changes the behavior back to using
`Math.round` if we recognize a number that can not be converted to a
BigDecimal.
* Add tests and fix spellcheck
* update error message in ExpressionsTest
* Address comments
* fix up round for infinity
* round non numeric doubles returns a double
* fix spotbugs
* Update docs/misc/math-expr.md
* Update docs/querying/sql.md
* lpad and rpad functions deal with empty pad
Return null if the pad string used by the `lpad` and `rpad` functions is
an empty string
* Fix rpad
* Match PostgreSQL behavior in SQL compliant null handling mode
* Match PostgreSQL behavior for pad -ve len
* address review comments
* Fix broadcast rule drop and docs
* Remove racy test check
* Don't drop non-broadcast segments on tasks, add overshadowing handling
* Don't use realtimes for overshadowing
* Fix dropping for ingestion services
* Add REGEXP_LIKE, fix empty-pattern bug in REGEXP_EXTRACT.
- Add REGEXP_LIKE function that returns a boolean, and is useful in
WHERE clauses.
- Fix REGEXP_EXTRACT return type (should be nullable; causes incorrect
filter elision).
- Fix REGEXP_EXTRACT behavior for empty patterns: should always match
(previously, they threw errors).
- Improve error behavior when REGEXP_EXTRACT and REGEXP_LIKE are passed
non-literal patterns.
- Improve documentation of REGEXP_EXTRACT.
* Changes based on PR review.
* Fix arg check.
* Important fixes!
* Add speller.
* wip
* Additional tests.
* Fix up tests.
* Add validation error tests.
* Additional tests.
* Remove useless call.
* Update tutorial-query.md
* First full pass complete
* Smoothing over, a bit
* link and spell checking
* Update querying.md
* Review comments; screenshot fixes
* Making ports consistent, pending confirmation
Switching to the Router port, to make this be consistent with the tutorial ports, but can switch back here and there if it should be 8082 instead.
* Resizing screenshot
* Update querying.md
* Review feedback incorporated.
* Add AvroOCFInputFormat
* Support supplying a reader schema in AvroOCFInputFormat
* Add docs for Avro OCF input format
* Address review comments
* Address second round of review
* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit
* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit
* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit
* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit
* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit
* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit
* address comments
* address comments
* fix checkstyle
* address comments
* address comments
* Update data-formats.md
Per Suneet, "Since you're editing this file can you also fix the json on line 177 please - it's missing a comma after the }"
* Light text cleanup
* Removing discussion of sample data, since it's repeated in the data loading tutorial, and not immediately relevant here.
* Clarifying accepted values for URI lookup
* Update index.md
* original quickstart full first pass
* original quickstart full first pass
* first pass all the way through
* straggler
* image touchups and finished old tutorial
* a bit of finishing up
* druid-caffeine-cache ext previously removed
* Sample MaxDirectMemorySize value unrealistic
* Review comments
* fixing links
* spell checking gymnastics
* workerThreads desc slightly expanded
* typo
* Typo
* Reversing Kafka config order
* Changing order of configs for Kinesis
* Trying this again: ioConfig then tuningConfig
* Update data-formats.md
Per Suneet, "Since you're editing this file can you also fix the json on line 177 please - it's missing a comma after the }"
* Light text cleanup
* Removing discussion of sample data, since it's repeated in the data loading tutorial, and not immediately relevant here.
* Update index.md
* original quickstart full first pass
* original quickstart full first pass
* first pass all the way through
* straggler
* image touchups and finished old tutorial
* a bit of finishing up
* Review comments
* fixing links
* spell checking gymnastics
* Adding support for autoscaling in GCE
* adding extra google deps also in gce pom
* fix link in doc
* remove unused deps
* adding terms to spelling file
* version in pom 0.17.0-incubating-SNAPSHOT --> 0.18.0-SNAPSHOT
* GCEXyz -> GceXyz in naming for consistency
* add preconditions
* add VisibleForTesting annotation
* typos in comments
* use StringUtils.format instead of String.format
* use custom exception instead of exit
* factorize interval time between retries
* making literal value a constant
* iter all network interfaces
* use provided on google (non api) deps
* adding missing dep
* removing unneded this and use Objects methods instead o 3-way if in hash and comparison
* adding import
* adding retries around getRunningInstances and adding limit for operation end waiting
* refactor GceEnvironmentConfig.hashCode
* 0.18.0-SNAPSHOT -> 0.19.0-SNAPSHOT
* removing unused config
* adding tests to hash and equals
* adding nullable to waitForOperationEnd
* adding testTerminate
* adding unit tests for createComputeService
* increasing retries in unrelated integration-test to prevent sporadic failure (hopefully)
* reverting queryResponseTemplate change
* adding comment for Compute.Builder.build() returning null
- Reorder both the datasource and query-execution page orderings to
table, lookup, union, inline, query, join. (Roughly increasing order
of conceptual "fanciness".)
- Add more crosslinks from datasource page to query-execution page:
one per datasource type.
* Refresh query docs.
Larger changes:
- New doc: querying/datasource.md describes the various kinds of
datasources you can use, and has examples for both SQL and native.
- New doc: querying/query-execution.md describes how native queries
are executed at a high level. It doesn't go into the details of specific
query engines or how queries run at a per-segment level. But I think it
would be good to add or link that content here in the future.
- Refreshed doc: querying/sql.md updated to refer to joins, reformatted
a bit, added a new "Query translation" section that explains how
queries are translated from SQL to native, and removed configuration
details (moved to configuration/index.md).
- Refreshed doc: querying/joins.md updated to refer to join datasources.
Smaller changes:
- Add helpful banners to the top of query documentation pages telling
people whether a given page describes SQL, native, or both.
- Add SQL metrics to operations/metrics.md.
- Add some color and cross-links in various places.
- Add native query component docs to the sidebar, and renamed them so
they look nicer.
- Remove Select query from the sidebar.
- Fix Broker SQL configs in configuration/index.md. Remove them from
querying/sql.md.
- Combined querying/searchquery.md and querying/searchqueryspec.md.
* Updates.
* Fix numbering.
* Fix glitches.
* Add new words to spellcheck file.
* Assorted changes.
* Further adjustments.
* Add missing punctuation.
* Add API to trigger a compaction by the coordinator for integration tests
* Add missing integration tests for the compaction by the coordinator
* address comments
* Document possible vulnerabilities for the druid-ranger-security
In certain configurations the ranger plugin can expose vulnerabilities due
to some of its dependencies having CVEs.
* Spelling checker is a bit tight
* kinesis IT
* Kinesis IT
* Kinesis IT
* Kinesis IT
* Kinesis IT
* Kinesis IT
* Kinesis IT
* Kinesis IT
* Kinesis IT
* Kinesis IT
* Kinesis IT
* Kinesis IT
* Kinesis IT
* Kinesis IT
* Kinesis IT
* fix kinesis timeout
* Kinesis IT
* Kinesis IT
* fix checkstyle
* Kinesis IT
* address comments
* fix checkstyle
* druid pac4j security extension for OpenID Connect OAuth 2.0 authentication
* update version in druid-pac4j pom
* introducing unauthorized resource filter
* authenticated but authorized /unified-webconsole.html
* use httpReq.getRequestURI() for matching callback path
* add documentation
* minor doc addition
* licesne file updates
* make dependency analyze succeed
* fix doc build
* hopefully fixes doc build
* hopefully fixes license check build
* yet another try on fixing license build
* revert unintentional changes to website folder
* update version to 0.18.0-SNAPSHOT
* check session and its expiry on each request
* add crypto service
* code for encrypting the cookie
* update doc with cookiePassphrase
* update license yaml
* make sessionstore in Pac4jFilter private non static
* make Pac4jFilter fields final
* okta: use sha256 for hmac
* remove incubating
* add UTs for crypto util and session store impl
* use standard charsets
* add license header
* remove unused file
* add org.objenesis.objenesis to license.yaml
* a bit of nit changes in CryptoService and embedding EncryptionResult for clarity
* rename alg to cipherAlgName
* take cipher alg name, mode and padding as input
* add java doc for CryptoService and make it more understandable
* another UT for CryptoService
* cache pac4j Config
* use generics clearly in Pac4jSessionStore
* update cookiePassphrase doc to mention PasswordProvider
* mark stuff Nullable where appropriate in Pac4jSessionStore
* update doc to mention jdbc
* add error log on reaching callback resource
* javadoc for Pac4jCallbackResource
* introduce NOOP_HTTP_ACTION_ADAPTER
* add correct module name in license file
* correct extensions folder name in licenses.yaml
* replace druid-kubernetes-extensions to druid-pac4j
* cache SecureRandom instance
* rename UnauthorizedResourceFilter to AuthenticationOnlyResourceFilter
* SQL support for joins on subqueries.
Changes to SQL module:
- DruidJoinRule: Allow joins on subqueries (left/right are no longer
required to be scans or mappings).
- DruidJoinRel: Add cost estimation code for joins on subqueries.
- DruidSemiJoinRule, DruidSemiJoinRel: Removed, since DruidJoinRule can
handle this case now.
- DruidRel: Remove Nullable annotation from toDruidQuery, because
it is no longer needed (it was used by DruidSemiJoinRel).
- Update Rules constants to reflect new rules available in our current
version of Calcite. Some of these are useful for optimizing joins on
subqueries.
- Rework cost estimation to be in terms of cost per row, and place all
relevant constants in CostEstimates.
Other changes:
- RowBasedColumnSelectorFactory: Don't set hasMultipleValues. The lack
of isComplete is enough to let callers know that columns might have
multiple values, and explicitly setting it to true causes
ExpressionSelectors to think it definitely has multiple values, and
treat the inputs as arrays. This behavior interfered with some of the
new tests that involved queries on lookups.
- QueryContexts: Add maxSubqueryRows parameter, and use it in druid-sql
tests.
* Fixes for tests.
* Adjustments.
* Match GREATEST/LEAST function behavior
Change the behavior of the GREATEST / LEAST functions to be similar to
how it is implemented in other databases (as functions instead of
aggregators). The GREATEST/LEAST functions are not in the SQL standard,
but users will expect behavior similar to what other databases provide.
* Match postgres behavior & handle more SQL types
* Fix imports
* Add OnHeapMemorySegmentWriteOutMediumFactory
Add a factory for OnHeapMemorySegmentWriteOutMedium to support direct writing via Spark.
* Register OnHeapMemorySegmentWriteOutMediumFactory.
Register OnHeapMemorySegmentWriteOutMediumFactory with SegmentWriteOutMediumFactory.
* Remove unnecessary throws
The base `makeSegmentWriteOutMedium` throws an IOException, but the particular implementation of OnHeapMemorySegmentWriteOutMediumFactory does not throw a checked exception.
* Update SegmentWriteOutMedium docs to include onHeapMemory
Update the SegmentWriteOutMedium section of the indexing docs to include a description of the new OnHeapSegmentMediumWriteOut option.
* Skip empty files for local, hdfs, and cloud input sources
* split hint spec doc
* doc for skipping empty files
* fix typo; adjust tests
* unnecessary fluent iterable
* address comments
* fix test
* use the right lists
* fix test
* fix test
* Add SQL GROUPING SETS support.
Built on top of the subtotalsSpec feature in the groupBy query. This also involves
two changes to subtotalsSpec:
- Alter behavior so limitSpec is applied after subtotalsSpec, rather than applied to
each grouping set. This is more in line with SQL standard behavior. I think it is okay
to make this change, since the old behavior was not documented, so users should
hopefully not be depending on it.
- Fix a bug where virtual columns were included in the subtotal queries, but they
should not have been.
Also fixes two bugs in query equality checking:
- BaseQuery: Use getDuration() instead of "duration" in equals and hashCode, since the
latter is lazily initialized and might be null in one query but not the other.
- GroupByQuery: Include subtotalsSpec in equals and hashCode.
* Fix bugs.
* Fix tests.
* PR updates.
* Grouping class hygiene.
* Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
* Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
* Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
* fix build failure
* fix failing build
* fix failing build
* Code cleanup
* fix failing test
* Removed CloudConfigProperties and make specific class for each cloudInputSource
* Removed CloudConfigProperties and make specific class for each cloudInputSource
* pass s3ConfigProperties for split
* lazy init s3client
* update docs
* fix docs check
* address comments
* add ServerSideEncryptingAmazonS3.Builder
* fix failing checkstyle
* fix typo
* wrap the ServerSideEncryptingAmazonS3.Builder in a provider
* added java docs for S3InputSource constructor
* added java docs for S3InputSource constructor
* remove wrap the ServerSideEncryptingAmazonS3.Builder in a provider
* Move Azure extension into Core
Moving the azure extension into Core.
* * Fix build failure
* * Add The MIT License (MIT) to list of compatible licenses
* * Address review comments
* * change reference to contrib azure to core azure
* * Fix spelling mistakes.
* Add common optional dependencies for extensions
Include hadoop-aws and postgres JDBC connector jar to improve
out-of-the-box experience for extensions. The mysql JDBC connector jar
is not bundled as it is GPL.
* Update docs
* Fix typo
* Create splits of multiple files for parallel indexing
* fix wrong import and npe in test
* use the single file split in tests
* rename
* import order
* Remove specific local input source
* Update docs/ingestion/native-batch.md
Co-Authored-By: sthetland <steve.hetland@imply.io>
* Update docs/ingestion/native-batch.md
Co-Authored-By: sthetland <steve.hetland@imply.io>
* doc and error msg
* fix build
* fix a test and address comments
Co-authored-by: sthetland <steve.hetland@imply.io>
* add Expr.stringify which produces parseable expression strings, parser support for null values in arrays, and parser support for empty numeric arrays
* oops, macros are expressions too
* style
* spotbugs
* qualified type arrays
* review stuffs
* simplify grammar
* more permissive array parsing
* reuse expr joiner
* fix it
* Add Azure config options for segment prefix and max listing length
Added configuration options to allow the user to specify the prefix
within the segment container to store the segment files. Also
added a configuration option to allow the user to specify the
maximum number of input files to stream for each iteration.
* * Fix test failures
* * Address review comments
* * add dependency explicitly to pom
* * update docs
* * Address review comments
* * Address review comments
* Add config option for namespacePrefix
opentsdb emitter sends metric names to opentsdb verbatim as what druid
names them, for example "query.count", this doesn't fit well with a
central opentsdb server which might have namespaced metrics, for example
"druid.query.count". This adds support for adding an optional prefix.
The prefix also gets a trailing dot (.), after it, so the metric name
becomes <namespacePrefix>.<metricname>
configureable as "druid.emitter.opentsdb.namespacePrefix", as
documented.
Co-authored-by: Martin Gerholm <martin.gerholm@deltaprojects.com>
Signed-off-by: Martin Gerholm <martin.gerholm@deltaprojects.com>
Signed-off-by: Björn Zettergren <bjorn.zettergren@deltaprojects.com>
* Spelling for PR #9372
Added "namespacePrefix" to .spelling exceptions, it's a variable name
used in documentation for opentsdb-emitter.
* fixing tests for PR #9372
changed naming of variables to be more descriptive
added test of prefix being an empty string: "".
added a conditional to buildNamespacePrefix to check for empty string
being fed if EventConverter called without OpentsdbEmitterConfig
instance.
* fixing checkstyle errors for PR #9372
used == to compare literal string, should be equals()
* cleaned up and updated PR #9372
Created a buildMetric function as suggested by clintropolis, and
removed redundant tests for empty strings as they're only used when
calling EventConverter directly without going through
OpentsdbEmitterConfig.
* consistent naming of tests PR #9372
Changed names of tests in files to match better with what it was
actually testing
changed check for Strings.isNullOrEmpty to just check for `null`, as
empty string valued `namespacePrefix` is handled in
OpentsdbEmitterConfig.
Co-authored-by: Martin Gerholm <inspector-martin@users.noreply.github.com>