druid

Commit Graph

Author	SHA1	Message	Date
frank chen	b91d50044e	add some details to the build doc (#9885 ) * update initial build command * add some details for building * fix spelling check errors * fix spelling check warnings Signed-off-by: frank chen <frank.chen021@outlook.com>	2020-05-21 12:35:54 -07:00
Jianhuan Liu	2050f2b00a	fix docs error: google to azure and hdfs to http (#9881 )	2020-05-20 10:17:39 -07:00
Joseph Glanville	793f386d6a	Add support for Avro OCF using InputFormat (#9671 ) * Add AvroOCFInputFormat * Support supplying a reader schema in AvroOCFInputFormat * Add docs for Avro OCF input format * Address review comments * Address second round of review	2020-05-16 14:09:12 -07:00
Maytas Monsereenusorn	0a8bf83bc5	Bad plan for table-lookup-lookup join with filter on first lookup and outer limit (#9773 ) * Bad plan for table-lookup-lookup join with filter on first lookup and outer limit * Bad plan for table-lookup-lookup join with filter on first lookup and outer limit * Bad plan for table-lookup-lookup join with filter on first lookup and outer limit * Bad plan for table-lookup-lookup join with filter on first lookup and outer limit * Bad plan for table-lookup-lookup join with filter on first lookup and outer limit * Bad plan for table-lookup-lookup join with filter on first lookup and outer limit * address comments * address comments * fix checkstyle * address comments * address comments	2020-05-14 16:56:40 -07:00
awelsh93	6f25a84d2e	Add TaskCountStatsMonitor to config docs (#9447 )	2020-05-11 14:08:46 -07:00
sthetland	ce03f31a73	Clarifying workerThreads and a few other nits (#9804 ) * Update data-formats.md Per Suneet, "Since you're editing this file can you also fix the json on line 177 please - it's missing a comma after the }" * Light text cleanup * Removing discussion of sample data, since it's repeated in the data loading tutorial, and not immediately relevant here. * Clarifying accepted values for URI lookup * Update index.md * original quickstart full first pass * original quickstart full first pass * first pass all the way through * straggler * image touchups and finished old tutorial * a bit of finishing up * druid-caffeine-cache ext previously removed * Sample MaxDirectMemorySize value unrealistic * Review comments * fixing links * spell checking gymnastics * workerThreads desc slightly expanded * typo * Typo * Reversing Kafka config order * Changing order of configs for Kinesis * Trying this again: ioConfig then tuningConfig	2020-05-06 09:05:18 -07:00
Alexander Saydakov	844d626738	added number of bins parameter (#9436 ) * added number of bins parameter * addressed review points * test equals Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>	2020-05-04 16:53:09 -07:00
Jian Wang	85dfbb64cb	Update documention for metricCompression (#9811 )	2020-05-03 12:56:48 -07:00
sthetland	c61365c1e0	Druid Quickstart refactor and update (#9766 ) * Update data-formats.md Per Suneet, "Since you're editing this file can you also fix the json on line 177 please - it's missing a comma after the }" * Light text cleanup * Removing discussion of sample data, since it's repeated in the data loading tutorial, and not immediately relevant here. * Update index.md * original quickstart full first pass * original quickstart full first pass * first pass all the way through * straggler * image touchups and finished old tutorial * a bit of finishing up * Review comments * fixing links * spell checking gymnastics	2020-04-30 12:07:28 -07:00
Aleksei Chumagin	0642f778fa	changed Preview to Apply (#9757 )	2020-04-29 09:53:25 -07:00
James Dalton	b279e04a31	table fix (#9769 )	2020-04-28 11:23:24 -07:00
Francesco Nidito	e7e41e3a36	Adding support for autoscaling in GCE (#8987 ) * Adding support for autoscaling in GCE * adding extra google deps also in gce pom * fix link in doc * remove unused deps * adding terms to spelling file * version in pom 0.17.0-incubating-SNAPSHOT --> 0.18.0-SNAPSHOT * GCEXyz -> GceXyz in naming for consistency * add preconditions * add VisibleForTesting annotation * typos in comments * use StringUtils.format instead of String.format * use custom exception instead of exit * factorize interval time between retries * making literal value a constant * iter all network interfaces * use provided on google (non api) deps * adding missing dep * removing unneded this and use Objects methods instead o 3-way if in hash and comparison * adding import * adding retries around getRunningInstances and adding limit for operation end waiting * refactor GceEnvironmentConfig.hashCode * 0.18.0-SNAPSHOT -> 0.19.0-SNAPSHOT * removing unused config * adding tests to hash and equals * adding nullable to waitForOperationEnd * adding testTerminate * adding unit tests for createComputeService * increasing retries in unrelated integration-test to prevent sporadic failure (hopefully) * reverting queryResponseTemplate change * adding comment for Compute.Builder.build() returning null	2020-04-28 03:13:39 -07:00
Gian Merlino	4087a015e8	Datasource doc structure adjustments. (#9716 ) - Reorder both the datasource and query-execution page orderings to table, lookup, union, inline, query, join. (Roughly increasing order of conceptual "fanciness".) - Add more crosslinks from datasource page to query-execution page: one per datasource type.	2020-04-23 16:04:59 -07:00
Clint Wylie	e677c62484	document useFilterCNF query context parameter (#9647 ) * document useFilterCNF query context parameter * move context key to QueryContexts * Update .spelling	2020-04-16 22:12:20 -07:00
Clint Wylie	b89ad49396	disable group by config applyLimitPushDownToSegment by default (#9711 ) * disable group by config applyLimitPushDownToSegment by default * document	2020-04-16 03:03:35 -07:00
Gian Merlino	42590ae64b	Refresh query docs. (#9704 ) * Refresh query docs. Larger changes: - New doc: querying/datasource.md describes the various kinds of datasources you can use, and has examples for both SQL and native. - New doc: querying/query-execution.md describes how native queries are executed at a high level. It doesn't go into the details of specific query engines or how queries run at a per-segment level. But I think it would be good to add or link that content here in the future. - Refreshed doc: querying/sql.md updated to refer to joins, reformatted a bit, added a new "Query translation" section that explains how queries are translated from SQL to native, and removed configuration details (moved to configuration/index.md). - Refreshed doc: querying/joins.md updated to refer to join datasources. Smaller changes: - Add helpful banners to the top of query documentation pages telling people whether a given page describes SQL, native, or both. - Add SQL metrics to operations/metrics.md. - Add some color and cross-links in various places. - Add native query component docs to the sidebar, and renamed them so they look nicer. - Remove Select query from the sidebar. - Fix Broker SQL configs in configuration/index.md. Remove them from querying/sql.md. - Combined querying/searchquery.md and querying/searchqueryspec.md. * Updates. * Fix numbering. * Fix glitches. * Add new words to spellcheck file. * Assorted changes. * Further adjustments. * Add missing punctuation.	2020-04-15 16:12:20 -07:00
Maytas Monsereenusorn	8328d91b30	Add missing integration tests for the compaction by the coordinator (#9644 ) * Add API to trigger a compaction by the coordinator for integration tests * Add missing integration tests for the compaction by the coordinator * address comments	2020-04-15 14:27:33 -07:00
Will Salisbury	cda9f41e69	s/S3/GCS/g (#9700 ) fix typo [ at least I hope this was a typo… ]	2020-04-14 18:39:54 -07:00
Himanshu	ca369e5768	druid-pac4j: add ability to use custom ssl trust store while talking to auth server (#9637 ) * druid-pac4j: add ability for custom ssl trust store for talking to auth server * fix nimbusds DefaultResourceRetriever name in comment	2020-04-10 18:01:59 -07:00
bolkedebruin	ab5ac7f890	Document possible vulnerabilities for the druid-ranger-security (#9649 ) * Document possible vulnerabilities for the druid-ranger-security In certain configurations the ranger plugin can expose vulnerabilities due to some of its dependencies having CVEs. * Spelling checker is a bit tight	2020-04-09 10:43:11 -07:00
bolkedebruin	2d99966933	Add Apache Ranger Authorization (#9579 )	2020-04-04 18:02:24 +02:00
Maytas Monsereenusorn	1852bf33ea	Add Integration Test for functionality of kinesis ingestion (#9576 ) * kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * fix kinesis timeout * Kinesis IT * Kinesis IT * fix checkstyle * Kinesis IT * address comments * fix checkstyle	2020-04-03 09:45:22 -07:00
Neil Volungis	0ac875a8b4	Update docker.md readme to note memory requirements (#9529 ) * Update docker.md readme to note memory requirements * Fix grammatical error Co-Authored-By: Suneet Saldanha <44787917+suneet-s@users.noreply.github.com> Co-authored-by: Suneet Saldanha <44787917+suneet-s@users.noreply.github.com>	2020-03-24 03:33:29 -07:00
Clint Wylie	bf85ea19b2	roaring bitmaps by default (#9548 ) * it is finally time * fix it * more docs * fix doc	2020-03-23 18:15:57 -07:00
Himanshu	5604ac7963	druid extension for OpenID Connect auth using pac4j lib (#8992 ) * druid pac4j security extension for OpenID Connect OAuth 2.0 authentication * update version in druid-pac4j pom * introducing unauthorized resource filter * authenticated but authorized /unified-webconsole.html * use httpReq.getRequestURI() for matching callback path * add documentation * minor doc addition * licesne file updates * make dependency analyze succeed * fix doc build * hopefully fixes doc build * hopefully fixes license check build * yet another try on fixing license build * revert unintentional changes to website folder * update version to 0.18.0-SNAPSHOT * check session and its expiry on each request * add crypto service * code for encrypting the cookie * update doc with cookiePassphrase * update license yaml * make sessionstore in Pac4jFilter private non static * make Pac4jFilter fields final * okta: use sha256 for hmac * remove incubating * add UTs for crypto util and session store impl * use standard charsets * add license header * remove unused file * add org.objenesis.objenesis to license.yaml * a bit of nit changes in CryptoService and embedding EncryptionResult for clarity * rename alg to cipherAlgName * take cipher alg name, mode and padding as input * add java doc for CryptoService and make it more understandable * another UT for CryptoService * cache pac4j Config * use generics clearly in Pac4jSessionStore * update cookiePassphrase doc to mention PasswordProvider * mark stuff Nullable where appropriate in Pac4jSessionStore * update doc to mention jdbc * add error log on reaching callback resource * javadoc for Pac4jCallbackResource * introduce NOOP_HTTP_ACTION_ADAPTER * add correct module name in license file * correct extensions folder name in licenses.yaml * replace druid-kubernetes-extensions to druid-pac4j * cache SecureRandom instance * rename UnauthorizedResourceFilter to AuthenticationOnlyResourceFilter	2020-03-23 18:15:45 -07:00
Clint Wylie	d8833316c4	fix broken links (#9537 ) * fix broken links * missing / * adjustment	2020-03-22 17:41:18 -07:00
Gian Merlino	54c9325256	SQL support for joins on subqueries. (#9545 ) * SQL support for joins on subqueries. Changes to SQL module: - DruidJoinRule: Allow joins on subqueries (left/right are no longer required to be scans or mappings). - DruidJoinRel: Add cost estimation code for joins on subqueries. - DruidSemiJoinRule, DruidSemiJoinRel: Removed, since DruidJoinRule can handle this case now. - DruidRel: Remove Nullable annotation from toDruidQuery, because it is no longer needed (it was used by DruidSemiJoinRel). - Update Rules constants to reflect new rules available in our current version of Calcite. Some of these are useful for optimizing joins on subqueries. - Rework cost estimation to be in terms of cost per row, and place all relevant constants in CostEstimates. Other changes: - RowBasedColumnSelectorFactory: Don't set hasMultipleValues. The lack of isComplete is enough to let callers know that columns might have multiple values, and explicitly setting it to true causes ExpressionSelectors to think it definitely has multiple values, and treat the inputs as arrays. This behavior interfered with some of the new tests that involved queries on lookups. - QueryContexts: Add maxSubqueryRows parameter, and use it in druid-sql tests. * Fixes for tests. * Adjustments.	2020-03-22 16:43:55 -07:00
Clint Wylie	68013fbc64	fix issue where total limit was being applied even when not configured (#9534 ) * fix issue where total limit was being applied even when not configured * fix inspection * add reserved lane name check to manual laning strategy	2020-03-18 18:05:59 -07:00
Chi Cao Minh	e7b3dd9cd1	Update to mysql connector 5.1.48 (#9514 )	2020-03-16 10:38:31 -07:00
Clint Wylie	69af760a19	add manual laning strategy, integration test (#9492 ) * add manual laning strategy, integration test, json config test * share percent conversion method * wrong assert * review stuffs * doc adjustments * more tests * test adjustment * adjust docs * Update index.md	2020-03-13 20:06:55 -07:00
Clint Wylie	6afd55c8f4	threshold based automatic query prioritization (#9493 ) * threshold based automatic query prioritization * fixes * spelling and fixes * fix docs * spelling * checkstyle * adjustments * doc fix	2020-03-13 01:41:54 -07:00
Chi Cao Minh	6b02991464	Match GREATEST/LEAST function behavior to other DBs (#9488 ) * Match GREATEST/LEAST function behavior Change the behavior of the GREATEST / LEAST functions to be similar to how it is implemented in other databases (as functions instead of aggregators). The GREATEST/LEAST functions are not in the SQL standard, but users will expect behavior similar to what other databases provide. * Match postgres behavior & handle more SQL types * Fix imports	2020-03-12 15:10:11 -07:00
Maytas Monsereenusorn	e9888f41cb	Modify check java version script to indicate experimental support for Java 11 (#9455 ) * Modify check java version script to indicate experimental support for Java 11 * update docs	2020-03-11 09:22:39 -07:00
Himanshu	75a5591448	remove old unused zookeeper dependent lookups code (#9480 ) * remove old unused zookeeper dependent lookups code * make intellij inspector happy	2020-03-10 12:12:48 -07:00
Clint Wylie	8b9fe6f584	query laning and load shedding (#9407 ) * prototype * merge QueryScheduler and QueryManager * everything in its right place * adjustments * docs * fixes * doc fixes * use resilience4j instead of semaphore * more tests * simplify * checkstyle * spelling * oops heh * remove unused * simplify * concurrency tests * add SqlResource tests, refactor error response * add json config tests * use LongAdder instead of AtomicLong * remove test only stuffs from scheduler * javadocs, etc * style * partial review stuffs * adjust * review stuffs * more javadoc * error response documentation * spelling * preserve user specified lane for NoSchedulingStrategy * more test, why not * doc adjustment * style * missed review for make a thing a constant * fixes and tests * fix test * Update docs/configuration/index.md Co-Authored-By: sthetland <steve.hetland@imply.io> * doc update Co-authored-by: sthetland <steve.hetland@imply.io>	2020-03-10 02:57:16 -07:00
Jihoon Son	75e2051195	Convert array_contains() and array_overlaps() into native filters if possible (#9487 ) * Convert array_contains() and array_overlaps() into native filters if possible * make spotbugs happy and fix null results when null compatible	2020-03-09 22:50:38 -07:00
Maytas Monsereenusorn	814f5a9717	add password provider reference to s3 optional cred docs (#9439 )	2020-03-09 17:56:42 -07:00
Julian Jaffe	eda03630d0	Add OnHeapMemorySegmentWriteOutMediumFactory (#9454 ) * Add OnHeapMemorySegmentWriteOutMediumFactory Add a factory for OnHeapMemorySegmentWriteOutMedium to support direct writing via Spark. * Register OnHeapMemorySegmentWriteOutMediumFactory. Register OnHeapMemorySegmentWriteOutMediumFactory with SegmentWriteOutMediumFactory. * Remove unnecessary throws The base `makeSegmentWriteOutMedium` throws an IOException, but the particular implementation of OnHeapMemorySegmentWriteOutMediumFactory does not throw a checked exception. * Update SegmentWriteOutMedium docs to include onHeapMemory Update the SegmentWriteOutMedium section of the indexing docs to include a description of the new OnHeapSegmentMediumWriteOut option.	2020-03-05 22:34:08 -08:00
Jihoon Son	3016057178	Make Transform an ExtensionPoint (#9319 ) * Make Transform an ExtensionPoint * Add transform to the list of documented extensions * Add example transform implementation	2020-03-04 12:13:14 -08:00
Jihoon Son	9466ac7c9b	Skip empty files for local, hdfs, and cloud input sources (#9450 ) * Skip empty files for local, hdfs, and cloud input sources * split hint spec doc * doc for skipping empty files * fix typo; adjust tests * unnecessary fluent iterable * address comments * fix test * use the right lists * fix test * fix test	2020-03-03 20:51:06 -08:00
Gian Merlino	c9faf3e148	Add SQL GROUPING SETS support. (#9122 ) * Add SQL GROUPING SETS support. Built on top of the subtotalsSpec feature in the groupBy query. This also involves two changes to subtotalsSpec: - Alter behavior so limitSpec is applied after subtotalsSpec, rather than applied to each grouping set. This is more in line with SQL standard behavior. I think it is okay to make this change, since the old behavior was not documented, so users should hopefully not be depending on it. - Fix a bug where virtual columns were included in the subtotal queries, but they should not have been. Also fixes two bugs in query equality checking: - BaseQuery: Use getDuration() instead of "duration" in equals and hashCode, since the latter is lazily initialized and might be null in one query but not the other. - GroupByQuery: Include subtotalsSpec in equals and hashCode. * Fix bugs. * Fix tests. * PR updates. * Grouping class hygiene.	2020-02-26 08:52:39 -08:00
Maytas Monsereenusorn	92fb83726b	Add support for optional aws credentials for s3 for ingestion (#9375 ) * Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion * Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion * Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion * fix build failure * fix failing build * fix failing build * Code cleanup * fix failing test * Removed CloudConfigProperties and make specific class for each cloudInputSource * Removed CloudConfigProperties and make specific class for each cloudInputSource * pass s3ConfigProperties for split * lazy init s3client * update docs * fix docs check * address comments * add ServerSideEncryptingAmazonS3.Builder * fix failing checkstyle * fix typo * wrap the ServerSideEncryptingAmazonS3.Builder in a provider * added java docs for S3InputSource constructor * added java docs for S3InputSource constructor * remove wrap the ServerSideEncryptingAmazonS3.Builder in a provider	2020-02-25 20:59:53 -08:00
zachjsh	d771b42ed1	Move Azure extension into Core (#9394 ) * Move Azure extension into Core Moving the azure extension into Core. * * Fix build failure * * Add The MIT License (MIT) to list of compatible licenses * * Address review comments * * change reference to contrib azure to core azure * * Fix spelling mistakes.	2020-02-25 17:49:16 -08:00
als-sdin	f619903403	Updated the configuration documentation on coordinator kill tasks to clarify whether they delete only unused segments. (#9400 )	2020-02-25 13:15:55 -08:00
Chi Cao Minh	7fc99ee206	Add common optional dependencies for extensions (#9399 ) * Add common optional dependencies for extensions Include hadoop-aws and postgres JDBC connector jar to improve out-of-the-box experience for extensions. The mysql JDBC connector jar is not bundled as it is GPL. * Update docs * Fix typo	2020-02-25 00:04:00 -08:00
Jihoon Son	3bc7ae782c	Create splits of multiple files for parallel indexing (#9360 ) * Create splits of multiple files for parallel indexing * fix wrong import and npe in test * use the single file split in tests * rename * import order * Remove specific local input source * Update docs/ingestion/native-batch.md Co-Authored-By: sthetland <steve.hetland@imply.io> * Update docs/ingestion/native-batch.md Co-Authored-By: sthetland <steve.hetland@imply.io> * doc and error msg * fix build * fix a test and address comments Co-authored-by: sthetland <steve.hetland@imply.io>	2020-02-24 17:34:39 -08:00
Clint Wylie	6d8dd5ec10	string -> expression -> string -> expression (#9367 ) * add Expr.stringify which produces parseable expression strings, parser support for null values in arrays, and parser support for empty numeric arrays * oops, macros are expressions too * style * spotbugs * qualified type arrays * review stuffs * simplify grammar * more permissive array parsing * reuse expr joiner * fix it	2020-02-21 15:43:02 -08:00
zachjsh	f707064bed	Add Azure config options for segment prefix and max listing length (#9356 ) * Add Azure config options for segment prefix and max listing length Added configuration options to allow the user to specify the prefix within the segment container to store the segment files. Also added a configuration option to allow the user to specify the maximum number of input files to stream for each iteration. * * Fix test failures * * Address review comments * * add dependency explicitly to pom * * update docs * * Address review comments * * Address review comments	2020-02-21 14:12:03 -08:00
Jihoon Son	141d8dd875	Enable druid.coordinator.kill.pendingSegments.on by default (#9385 ) * Enable druid.coordinator.kill.pendingSegments.on by default * checkstyle	2020-02-21 13:13:49 -08:00
Björn Zettergren	30c24df4d3	Add config option for namespacePrefix (#9372 ) * Add config option for namespacePrefix opentsdb emitter sends metric names to opentsdb verbatim as what druid names them, for example "query.count", this doesn't fit well with a central opentsdb server which might have namespaced metrics, for example "druid.query.count". This adds support for adding an optional prefix. The prefix also gets a trailing dot (.), after it, so the metric name becomes <namespacePrefix>.<metricname> configureable as "druid.emitter.opentsdb.namespacePrefix", as documented. Co-authored-by: Martin Gerholm <martin.gerholm@deltaprojects.com> Signed-off-by: Martin Gerholm <martin.gerholm@deltaprojects.com> Signed-off-by: Björn Zettergren <bjorn.zettergren@deltaprojects.com> * Spelling for PR #9372 Added "namespacePrefix" to .spelling exceptions, it's a variable name used in documentation for opentsdb-emitter. * fixing tests for PR #9372 changed naming of variables to be more descriptive added test of prefix being an empty string: "". added a conditional to buildNamespacePrefix to check for empty string being fed if EventConverter called without OpentsdbEmitterConfig instance. * fixing checkstyle errors for PR #9372 used == to compare literal string, should be equals() * cleaned up and updated PR #9372 Created a buildMetric function as suggested by clintropolis, and removed redundant tests for empty strings as they're only used when calling EventConverter directly without going through OpentsdbEmitterConfig. * consistent naming of tests PR #9372 Changed names of tests in files to match better with what it was actually testing changed check for Strings.isNullOrEmpty to just check for `null`, as empty string valued `namespacePrefix` is handled in OpentsdbEmitterConfig. Co-authored-by: Martin Gerholm <inspector-martin@users.noreply.github.com>	2020-02-20 14:01:41 -08:00

1 2 3 4 5 ...

2100 Commits