* Fix broadcast rule drop and docs
* Remove racy test check
* Don't drop non-broadcast segments on tasks, add overshadowing handling
* Don't use realtimes for overshadowing
* Fix dropping for ingestion services
* Add REGEXP_LIKE, fix empty-pattern bug in REGEXP_EXTRACT.
- Add REGEXP_LIKE function that returns a boolean, and is useful in
WHERE clauses.
- Fix REGEXP_EXTRACT return type (should be nullable; causes incorrect
filter elision).
- Fix REGEXP_EXTRACT behavior for empty patterns: should always match
(previously, they threw errors).
- Improve error behavior when REGEXP_EXTRACT and REGEXP_LIKE are passed
non-literal patterns.
- Improve documentation of REGEXP_EXTRACT.
* Changes based on PR review.
* Fix arg check.
* Important fixes!
* Add speller.
* wip
* Additional tests.
* Fix up tests.
* Add validation error tests.
* Additional tests.
* Remove useless call.
* Update tutorial-query.md
* First full pass complete
* Smoothing over, a bit
* link and spell checking
* Update querying.md
* Review comments; screenshot fixes
* Making ports consistent, pending confirmation
Switching to the Router port, to make this be consistent with the tutorial ports, but can switch back here and there if it should be 8082 instead.
* Resizing screenshot
* Update querying.md
* Review feedback incorporated.
* Add AvroOCFInputFormat
* Support supplying a reader schema in AvroOCFInputFormat
* Add docs for Avro OCF input format
* Address review comments
* Address second round of review
* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit
* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit
* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit
* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit
* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit
* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit
* address comments
* address comments
* fix checkstyle
* address comments
* address comments
* Update data-formats.md
Per Suneet, "Since you're editing this file can you also fix the json on line 177 please - it's missing a comma after the }"
* Light text cleanup
* Removing discussion of sample data, since it's repeated in the data loading tutorial, and not immediately relevant here.
* Clarifying accepted values for URI lookup
* Update index.md
* original quickstart full first pass
* original quickstart full first pass
* first pass all the way through
* straggler
* image touchups and finished old tutorial
* a bit of finishing up
* druid-caffeine-cache ext previously removed
* Sample MaxDirectMemorySize value unrealistic
* Review comments
* fixing links
* spell checking gymnastics
* workerThreads desc slightly expanded
* typo
* Typo
* Reversing Kafka config order
* Changing order of configs for Kinesis
* Trying this again: ioConfig then tuningConfig
* Update data-formats.md
Per Suneet, "Since you're editing this file can you also fix the json on line 177 please - it's missing a comma after the }"
* Light text cleanup
* Removing discussion of sample data, since it's repeated in the data loading tutorial, and not immediately relevant here.
* Update index.md
* original quickstart full first pass
* original quickstart full first pass
* first pass all the way through
* straggler
* image touchups and finished old tutorial
* a bit of finishing up
* Review comments
* fixing links
* spell checking gymnastics
* Adding support for autoscaling in GCE
* adding extra google deps also in gce pom
* fix link in doc
* remove unused deps
* adding terms to spelling file
* version in pom 0.17.0-incubating-SNAPSHOT --> 0.18.0-SNAPSHOT
* GCEXyz -> GceXyz in naming for consistency
* add preconditions
* add VisibleForTesting annotation
* typos in comments
* use StringUtils.format instead of String.format
* use custom exception instead of exit
* factorize interval time between retries
* making literal value a constant
* iter all network interfaces
* use provided on google (non api) deps
* adding missing dep
* removing unneded this and use Objects methods instead o 3-way if in hash and comparison
* adding import
* adding retries around getRunningInstances and adding limit for operation end waiting
* refactor GceEnvironmentConfig.hashCode
* 0.18.0-SNAPSHOT -> 0.19.0-SNAPSHOT
* removing unused config
* adding tests to hash and equals
* adding nullable to waitForOperationEnd
* adding testTerminate
* adding unit tests for createComputeService
* increasing retries in unrelated integration-test to prevent sporadic failure (hopefully)
* reverting queryResponseTemplate change
* adding comment for Compute.Builder.build() returning null
- Reorder both the datasource and query-execution page orderings to
table, lookup, union, inline, query, join. (Roughly increasing order
of conceptual "fanciness".)
- Add more crosslinks from datasource page to query-execution page:
one per datasource type.
* Refresh query docs.
Larger changes:
- New doc: querying/datasource.md describes the various kinds of
datasources you can use, and has examples for both SQL and native.
- New doc: querying/query-execution.md describes how native queries
are executed at a high level. It doesn't go into the details of specific
query engines or how queries run at a per-segment level. But I think it
would be good to add or link that content here in the future.
- Refreshed doc: querying/sql.md updated to refer to joins, reformatted
a bit, added a new "Query translation" section that explains how
queries are translated from SQL to native, and removed configuration
details (moved to configuration/index.md).
- Refreshed doc: querying/joins.md updated to refer to join datasources.
Smaller changes:
- Add helpful banners to the top of query documentation pages telling
people whether a given page describes SQL, native, or both.
- Add SQL metrics to operations/metrics.md.
- Add some color and cross-links in various places.
- Add native query component docs to the sidebar, and renamed them so
they look nicer.
- Remove Select query from the sidebar.
- Fix Broker SQL configs in configuration/index.md. Remove them from
querying/sql.md.
- Combined querying/searchquery.md and querying/searchqueryspec.md.
* Updates.
* Fix numbering.
* Fix glitches.
* Add new words to spellcheck file.
* Assorted changes.
* Further adjustments.
* Add missing punctuation.
* Add API to trigger a compaction by the coordinator for integration tests
* Add missing integration tests for the compaction by the coordinator
* address comments
* Document possible vulnerabilities for the druid-ranger-security
In certain configurations the ranger plugin can expose vulnerabilities due
to some of its dependencies having CVEs.
* Spelling checker is a bit tight
* kinesis IT
* Kinesis IT
* Kinesis IT
* Kinesis IT
* Kinesis IT
* Kinesis IT
* Kinesis IT
* Kinesis IT
* Kinesis IT
* Kinesis IT
* Kinesis IT
* Kinesis IT
* Kinesis IT
* Kinesis IT
* Kinesis IT
* fix kinesis timeout
* Kinesis IT
* Kinesis IT
* fix checkstyle
* Kinesis IT
* address comments
* fix checkstyle
* druid pac4j security extension for OpenID Connect OAuth 2.0 authentication
* update version in druid-pac4j pom
* introducing unauthorized resource filter
* authenticated but authorized /unified-webconsole.html
* use httpReq.getRequestURI() for matching callback path
* add documentation
* minor doc addition
* licesne file updates
* make dependency analyze succeed
* fix doc build
* hopefully fixes doc build
* hopefully fixes license check build
* yet another try on fixing license build
* revert unintentional changes to website folder
* update version to 0.18.0-SNAPSHOT
* check session and its expiry on each request
* add crypto service
* code for encrypting the cookie
* update doc with cookiePassphrase
* update license yaml
* make sessionstore in Pac4jFilter private non static
* make Pac4jFilter fields final
* okta: use sha256 for hmac
* remove incubating
* add UTs for crypto util and session store impl
* use standard charsets
* add license header
* remove unused file
* add org.objenesis.objenesis to license.yaml
* a bit of nit changes in CryptoService and embedding EncryptionResult for clarity
* rename alg to cipherAlgName
* take cipher alg name, mode and padding as input
* add java doc for CryptoService and make it more understandable
* another UT for CryptoService
* cache pac4j Config
* use generics clearly in Pac4jSessionStore
* update cookiePassphrase doc to mention PasswordProvider
* mark stuff Nullable where appropriate in Pac4jSessionStore
* update doc to mention jdbc
* add error log on reaching callback resource
* javadoc for Pac4jCallbackResource
* introduce NOOP_HTTP_ACTION_ADAPTER
* add correct module name in license file
* correct extensions folder name in licenses.yaml
* replace druid-kubernetes-extensions to druid-pac4j
* cache SecureRandom instance
* rename UnauthorizedResourceFilter to AuthenticationOnlyResourceFilter
* SQL support for joins on subqueries.
Changes to SQL module:
- DruidJoinRule: Allow joins on subqueries (left/right are no longer
required to be scans or mappings).
- DruidJoinRel: Add cost estimation code for joins on subqueries.
- DruidSemiJoinRule, DruidSemiJoinRel: Removed, since DruidJoinRule can
handle this case now.
- DruidRel: Remove Nullable annotation from toDruidQuery, because
it is no longer needed (it was used by DruidSemiJoinRel).
- Update Rules constants to reflect new rules available in our current
version of Calcite. Some of these are useful for optimizing joins on
subqueries.
- Rework cost estimation to be in terms of cost per row, and place all
relevant constants in CostEstimates.
Other changes:
- RowBasedColumnSelectorFactory: Don't set hasMultipleValues. The lack
of isComplete is enough to let callers know that columns might have
multiple values, and explicitly setting it to true causes
ExpressionSelectors to think it definitely has multiple values, and
treat the inputs as arrays. This behavior interfered with some of the
new tests that involved queries on lookups.
- QueryContexts: Add maxSubqueryRows parameter, and use it in druid-sql
tests.
* Fixes for tests.
* Adjustments.
* Match GREATEST/LEAST function behavior
Change the behavior of the GREATEST / LEAST functions to be similar to
how it is implemented in other databases (as functions instead of
aggregators). The GREATEST/LEAST functions are not in the SQL standard,
but users will expect behavior similar to what other databases provide.
* Match postgres behavior & handle more SQL types
* Fix imports
* Add OnHeapMemorySegmentWriteOutMediumFactory
Add a factory for OnHeapMemorySegmentWriteOutMedium to support direct writing via Spark.
* Register OnHeapMemorySegmentWriteOutMediumFactory.
Register OnHeapMemorySegmentWriteOutMediumFactory with SegmentWriteOutMediumFactory.
* Remove unnecessary throws
The base `makeSegmentWriteOutMedium` throws an IOException, but the particular implementation of OnHeapMemorySegmentWriteOutMediumFactory does not throw a checked exception.
* Update SegmentWriteOutMedium docs to include onHeapMemory
Update the SegmentWriteOutMedium section of the indexing docs to include a description of the new OnHeapSegmentMediumWriteOut option.
* Skip empty files for local, hdfs, and cloud input sources
* split hint spec doc
* doc for skipping empty files
* fix typo; adjust tests
* unnecessary fluent iterable
* address comments
* fix test
* use the right lists
* fix test
* fix test
* Add SQL GROUPING SETS support.
Built on top of the subtotalsSpec feature in the groupBy query. This also involves
two changes to subtotalsSpec:
- Alter behavior so limitSpec is applied after subtotalsSpec, rather than applied to
each grouping set. This is more in line with SQL standard behavior. I think it is okay
to make this change, since the old behavior was not documented, so users should
hopefully not be depending on it.
- Fix a bug where virtual columns were included in the subtotal queries, but they
should not have been.
Also fixes two bugs in query equality checking:
- BaseQuery: Use getDuration() instead of "duration" in equals and hashCode, since the
latter is lazily initialized and might be null in one query but not the other.
- GroupByQuery: Include subtotalsSpec in equals and hashCode.
* Fix bugs.
* Fix tests.
* PR updates.
* Grouping class hygiene.