druid

Commit Graph

Author	SHA1	Message	Date
Rohan Garg	eabce8a159	Fix flakiness in query-retry ITs (#12818 )	2022-08-02 17:20:16 +05:30
Kashif Faraz	8dc4a155c7	Fix flaky IT: ITPerfectRollupParallelBatchIndexTest (#12737 ) * Increase worker.intermediaryPartitionTimeout in ITs to 30 mins * Update timeout to 60 mins * Remove timeout change from indexer	2022-07-09 17:15:51 +05:30
Maytas Monsereenusorn	1558ef471c	Add some debug tips for debugging peons (#12697 ) * add some debug tips * address comments * fix typo	2022-07-09 01:47:25 -07:00
Tejaswini Bandlamudi	99e1b4efee	Update default value of `inputSegmentSizeBytes` in configuration docs (#12678 )	2022-06-22 09:05:03 +05:30
Jihoon Son	73ce5df22d	Add support for authorizing query context params (#12396 ) The query context is a way that the user gives a hint to the Druid query engine, so that they enforce a certain behavior or at least let the query engine prefer a certain plan during query planning. Today, there are 3 types of query context params as below. Default context params. They are set via druid.query.default.context in runtime properties. Any user context params can be default params. User context params. They are set in the user query request. See https://druid.apache.org/docs/latest/querying/query-context.html for parameters. System context params. They are set by the Druid query engine during query processing. These params override other context params. Today, any context params are allowed to users. This can cause 1) a bad UX if the context param is not matured yet or 2) even query failure or system fault in the worst case if a sensitive param is abused, ex) maxSubqueryRows. This PR adds an ability to limit context params per user role. That means, a query will fail if you have a context param set in the query that is not allowed to you. To do that, this PR adds a new built-in resource type, QUERY_CONTEXT. The resource to authorize has a name of the context param (such as maxSubqueryRows) and the type of QUERY_CONTEXT. To allow a certain context param for a user, the user should be granted WRITE permission on the context param resource. Here is an example of the permission. { "resourceAction" : { "resource" : { "name" : "maxSubqueryRows", "type" : "QUERY_CONTEXT" }, "action" : "WRITE" }, "resourceNamePattern" : "maxSubqueryRows" } Each role can have multiple permissions for context params. Each permission should be set for different context params. When a query is issued with a query context X, the query will fail if the user who issued the query does not have WRITE permission on the query context X. In this case, HTTP endpoints will return 403 response code. JDBC will throw ForbiddenException. Note: there is a context param called brokerService that is used only by the router. This param is used to pin your query to run it in a specific broker. Because the authorization is done not in the router, but in the broker, if you have brokerService set in your query without a proper permission, your query will fail in the broker after routing is done. Technically, this is not right because the authorization is checked after the context param takes effect. However, this should not cause any user-facing issue and thus should be OK. The query will still fail if the user doesn’t have permission for brokerService. The context param authorization can be enabled using druid.auth.authorizeQueryContextParams. This is disabled by default to avoid any hassle when someone upgrades his cluster blindly without reading release notes.	2022-04-21 14:21:16 +05:30
Tejaswini Bandlamudi	984904779b	Increase default DatasourceCompactionConfig.inputSegmentSizeBytes to Long.MAX_VALUE (#12381 ) The current default value of inputSegmentSizeBytes is 400MB, which is pretty low for most compaction use cases. Thus most users are forced to override the default. The default value is now increased to Long.MAX_VALUE.	2022-04-04 16:28:53 +05:30
Jihoon Son	ab3d994a17	Lazy instantiation for segmentKillers, segmentMovers, and segmentArchivers (#12207 ) * working * Lazily load segmentKillers, segmentMovers, and segmentArchivers * more tests * test-jar plugin * more coverage * lazy client * clean up changes * checkstyle * i did not change the branch condition * adjust failure rate to run tests faster * javadocs * checkstyle	2022-02-08 13:02:06 -08:00
Suneet Saldanha	ced1389d4c	Enable auto kill segments by default (#12187 ) * Enable auto-kill by default * tests * wip * test * fix IT * fix it * remove from docs * make coverage bot happy	2022-02-07 06:57:54 -08:00
Frank Chen	c8ddf60851	Upgrade RSA Key from 1024 bit to 4096 to eliminate warnings (#11743 ) * eliminate warnings * Change the keyStore type to PKCS12	2022-01-11 13:24:09 +08:00
Jihoon Son	4a74c5adcc	Use Druid's extension loading for integration test instead of maven (#12095 ) * Use Druid's extension loading for integration test instead of maven * fix maven command * override config path * load input format extensions and kafka by default; add prepopulated-data group * all docker-composes are overridable * fix s3 configs * override config for all * fix docker_compose_args * fix security tests * turn off debug logs for overlord api calls * clean up stuff * revert docker-compose.yml * fix override config for query error test; fix circular dependency in docker compose * add back some dependencies in docker compose * new maven profile for integration test * example file filter	2022-01-05 23:33:04 -08:00
Frank Chen	2e3767bef0	Use the last ip as docker host ip (#11742 )	2021-11-20 13:31:39 +08:00
Karan Kumar	90640bb316	Support for hadoop 3 via maven profiles (#11794 ) Add support for hadoop 3 profiles . Most of the details are captured in #11791 . We use a combination of maven profiles and resource filtering to achieve this. Hadoop2 is supported by default and a new maven profile with the name hadoop3 is created. This will allow the user to choose the profile which is best suited for the use case.	2021-10-30 22:46:24 +05:30
Kashif Faraz	abac9e39ed	Revert permission changes to Supervisor and Task APIs (#11819 ) * Revert "Require Datasource WRITE authorization for Supervisor and Task access (#11718)" This reverts commit `f2d6100124`. * Revert "Require DATASOURCE WRITE access in SupervisorResourceFilter and TaskResourceFilter (#11680)" This reverts commit `6779c4652d`. * Fix docs for the reverted commits * Fix and restore deleted tests * Fix and restore SystemSchemaTest	2021-10-25 14:50:38 +05:30
Kashif Faraz	f2d6100124	Require Datasource WRITE authorization for Supervisor and Task access (#11718 ) Follow up PR for #11680 Description Supervisor and Task APIs are related to ingestion and must always require Datasource WRITE authorization even if they are purely informative. Changes Check Datasource WRITE in SystemSchema for tables "supervisors" and "tasks" Check Datasource WRITE for APIs /supervisor/history and /supervisor/{id}/history Check Datasource for all Indexing Task APIs	2021-10-08 10:39:48 +05:30
Clint Wylie	5de26cf6d9	add optional system schema authorization (#11720 ) * add optional system schema authorization * remove unused * adjust docs * doc fixes, missing ldap config change for integration tests * style	2021-09-21 13:28:26 -07:00
Maytas Monsereenusorn	fc86a7a97f	fix custom coordinator duty (#11641 )	2021-08-31 14:04:00 +07:00
Clint Wylie	a09688862e	fix integration tests (#11638 ) * Update Dockerfile * Update docker_build_containers.sh * Update Dockerfile	2021-08-30 13:53:13 -07:00
Maytas Monsereenusorn	ce4dd48bb8	Support custom coordinator duties (#11601 ) * impl * fix checkstyle * fix checkstyle * fix checkstyle * add test * add test * add test * add integration tests * add integration tests * add more docs * address comments * address comments * address comments * add test * fix checkstyle * fix test	2021-08-19 11:54:11 +07:00
Parag Jain	c7b46671b3	option to use deep storage for storing shuffle data (#11507 ) Fixes #11297. Description Description and design in the proposal #11297 Key changed/added classes in this PR DataSegmentPusher ShuffleClient PartitionStat PartitionLocation *IntermediaryDataManager	2021-08-13 16:40:25 -04:00
Maytas Monsereenusorn	f5d53569ca	Supervisor metadata auto cleanup failing as missing Guice injection (#11424 ) * Fix Supervisor metadata auto cleanup failing as missing Guice injection * Fix Supervisor metadata auto cleanup failing as missing Guice injection * fix IT * fix IT * Update services/src/main/java/org/apache/druid/cli/CliCoordinator.java Co-authored-by: Clint Wylie <cjwylie@gmail.com> * fix * fix * fix * fix * fix * fix * fix Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2021-07-13 09:47:49 +07:00
Clint Wylie	63fcd77c38	support using mariadb connector with mysql extensions (#11402 ) * support using mariadb connector with mysql extensions * cleanup and more tests * fix test * javadocs, more tests, etc * style and more test * more test more better * missing pom * more pom	2021-07-08 12:25:37 -07:00
zachjsh	27f1b6cbf3	Fix Index hadoop failing with index.zip is not a valid DFS filename (#11316 ) * * Fix bug * * simplify class loading * * fix example configs for integration tests * Small classloader cleanup Co-authored-by: jon-wei <jon.wei@imply.io>	2021-06-01 19:14:50 -04:00
Maytas Monsereenusorn	e5633d7842	Fix bug: 502 bad gateway thrown when we edit/delete any auto compaction config created 0.21.0 or before (#11311 ) * fix bug * add test * fix IT * fix checkstyle * address comments	2021-05-27 16:34:32 -07:00
Xavier Léauté	b517c3339b	remove ZooKeeper 3.4 support + pass tests with Java 15 (#11073 ) With this change, Druid will only support ZooKeeper 3.5.x and later. In order to support Java 15 we need to switch to ZK 3.5.x client libraries and drop support for ZK 3.4.x (see #10780 for the detailed reasons) * remove ZooKeeper 3.4.x compatibility * exclude additional ZK 3.5.x netty dependencies to ensure we use our version * keep ZooKeeper version used for integration tests in sync with client library version * remove the need to specify ZK version at runtime for docker * add support to run integration tests with JDK 15 * build and run unit tests with Java 15 in travis	2021-05-25 12:49:49 -07:00
Agustin Gonzalez	383daa4029	Catch exception inside ITRetryUtil to fix one of the causes for flaky integration tests (#11265 ) * Do not stop retrying when an exception is encountered. Save & propagate last exception if retry count is exceeded. * Add one more log message to help with debugging * Limit schema registry heap to attempt to control OOMs	2021-05-19 13:56:02 -07:00
Yi Yuan	3be8e29269	Add integration test for protobuf (#11126 ) * add file test * test * for test * bug fixed * test * test * test * bug fixed * delete auto scaler * add input format * add extensions * bug fixed * bug fixed * bug fixed * revert * add schema registry test * bug fixed * bug fixed * delete desc * delete change * add desc * bug fixed * test inputformat * bug fixed * bug fixed * bug fixed * bug fixed * delete io exception * change builder not static * change pom * bug fixed Co-authored-by: yuanyi <yuanyi@freewheel.tv>	2021-05-17 15:45:07 -07:00
Xavier Léauté	0296f20551	upgrade Apache Kafka to 2.8.0 (#11139 ) * upgrade to Apache Kafka 2.8.0 (release notes: https://downloads.apache.org/kafka/2.8.0/RELEASE_NOTES.html) * pass Kafka version as a Docker argument in integration tests to keep in sync with maven version * fix use of internal Kafka APIs in integration tests	2021-04-24 08:27:07 -07:00
Jihoon Son	a6a2758095	More unit tests for JsonParserIterator; Integration tests for query errors (#11091 ) * unit tests for timeout exception in init * integration tests * run integraion test on travis * fix inspection	2021-04-12 15:08:50 -07:00
Xavier Léauté	15bdd6bc2f	Fix unit tests and GC settings for Java 15 (#11074 ) * JavaScript script engine support was removed in JDK 15: skip those tests for JDKs without it * Fix flaky HTTP client tests with Java 15 * Switch from CMS to G1GC in integration tests, since CMS is no longer available in JDK 15	2021-04-08 10:33:37 -07:00
Himadri Singh	74ae2eb71a	Fix Integration Tests (#11046 )	2021-03-30 01:03:49 +05:30
Gian Merlino	bf20f9e979	DruidInputSource: Fix issues in column projection, timestamp handling. (#10267 ) * DruidInputSource: Fix issues in column projection, timestamp handling. DruidInputSource, DruidSegmentReader changes: 1) Remove "dimensions" and "metrics". They are not necessary, because we can compute which columns we need to read based on what is going to be used by the timestamp, transform, dimensions, and metrics. 2) Start using ColumnsFilter (see below) to decide which columns we need to read. 3) Actually respect the "timestampSpec". Previously, it was ignored, and the timestamp of the returned InputRows was set to the `__time` column of the input datasource. (1) and (2) together fix a bug in which the DruidInputSource would not properly read columns that are used as inputs to a transformSpec. (3) fixes a bug where the timestampSpec would be ignored if you attempted to set the column to something other than `__time`. (1) and (3) are breaking changes. Web console changes: 1) Remove "Dimensions" and "Metrics" from the Druid input source. 2) Set timestampSpec to `{"column": "__time", "format": "millis"}` for compatibility with the new behavior. Other changes: 1) Add ColumnsFilter, a new class that allows input readers to determine which columns they need to read. Currently, it's only used by the DruidInputSource, but it could be used by other columnar input sources in the future. 2) Add a ColumnsFilter to InputRowSchema. 3) Remove the metric names from InputRowSchema (they were unused). 4) Add InputRowSchemas.fromDataSchema method that computes the proper ColumnsFilter for given timestamp, dimensions, transform, and metrics. 5) Add "getRequiredColumns" method to TransformSpec to support the above. * Various fixups. * Uncomment incorrectly commented lines. * Move TransformSpecTest to the proper module. * Add druid.indexer.task.ignoreTimestampSpecForDruidInputSource setting. * Fix. * Fix build. * Checkstyle. * Misc fixes. * Fix test. * Move config. * Fix imports. * Fixup. * Fix ShuffleResourceTest. * Add import. * Smarter exclusions. * Fixes based on tests. Also, add TIME_COLUMN constant in the web console. * Adjustments for tests. * Reorder test data. * Update docs. * Update docs to say Druid 0.22.0 instead of 0.21.0. * Fix test. * Fix ITAutoCompactionTest. * Changes from review & from merging.	2021-03-25 10:32:21 -07:00
Jihoon Son	6aec8f0c1b	allow multiple ldap bootstrap files for integration tests (#11023 )	2021-03-23 13:18:36 -07:00
Vyatcheslav Mogilevsky	b0432be07a	Apache archive mirror (#10979 ) * Ability to use mirror of archive.apache.org * Ability to use mirror of archive.apache.org: documentation * Ability to use mirror of archive.apache.org: fix int test Dockerfile: missing COPY instruction	2021-03-11 09:07:51 -08:00
Clint Wylie	96889cdebc	add avro + kafka + schema registry integration test (#10929 ) * add avro + schema registry integration test * style * retry init * maybe this * oops heh * this will fix it * review stuffs * fix comment	2021-03-08 08:12:12 -08:00
zachjsh	553f5c8570	Ldap integration tests (#10901 ) * Add integration tests for ldap extension * * refactor * * add ldap-security integration test to travis * * fix license error * * Fix failing other integration test * * break up large tests * refactor * address review comments * * fix intellij inspections failure * * remove dead code	2021-02-23 13:29:57 -08:00
Jihoon Son	397e7455ba	Increase heap to 64m for custom node (#10846 )	2021-02-03 16:23:19 -08:00
Xavier Léauté	c346ce64b1	move integration tests from ZooKeeper 3.4.x to 3.5.x (#10786 ) * move integration tests from ZooKeeper 3.4.x to 3.5.x * run a subset of our integration tests with ZK 3.4 for backwards compatibility testing. * remove need to build separate docker-base image - use multi-stage build for the base image - use openjdk base image instead of building our own JDK base - workaround Debian not including MySQL by using MariaDB - download mysql connector directly instead of using distro version * fix incorrect openssl command failing on Debian * keep mysql connector version in sync with pom version	2021-01-31 08:35:39 -08:00
Maytas Monsereenusorn	a46d561bd7	Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead (#10740 ) * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * fix checkstyle * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * fix test * fix test * add log * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * address comments * fix checkstyle * fix checkstyle * add config to skip overhead memory calculation * add test for the skipBytesInMemoryOverheadCheck config * add docs * fix checkstyle * fix checkstyle * fix spelling * address comments * fix travis * address comments	2021-01-27 00:34:56 -08:00
Clint Wylie	74fbdd322d	refactor NodeRole so extensions can participate in disco and announcement (#10700 ) * refactor NodeRole so extensions can participate in disco and announcement * fixes, maybe * retries * javadoc * fix * spelling	2020-12-24 15:29:32 -08:00
Clint Wylie	da0eabaa01	integration test for coordinator and overlord leadership client (#10680 ) * integration test for coordinator and overlord leadership, added sys.servers is_leader column * docs * remove not needed * fix comments * fix compile heh * oof * revert unintended * fix tests, split out docker-compose file selection from starting cluster, use docker-compose down to stop cluster * fixes * style * dang * heh * scripts are hard * fix spelling * fix thing that must not matter since was already wrong ip, log when test fails * needs more heap * fix merge * less aggro	2020-12-17 22:50:12 -08:00
Gian Merlino	96a387d972	Fixes and tests related to the Indexer process. (#10631 ) * Fixes and tests related to the Indexer process. Three bugs fixed: 1) Indexers would not announce themselves as segment servers if they did not have storage locations defined. This used to work, but was broken in #9971. Fixed this by adding an "isSegmentServer" method to ServerType and updating SegmentLoadDropHandler to always announce if this method returns true. 2) Certain batch task types were written in a way that assumed "isReady" would be called before "run", which is not guaranteed. In particular, they relied on it in order to initialize "taskLockHelper". Fixed this by updating AbstractBatchIndexTask to ensure "isReady" is called before "run" for these tasks. 3) UnifiedIndexerAppenderatorsManager did not properly handle complex datasources. Introduced DataSourceAnalysis in order to fix this. Test changes: 1) Add a new "docker-compose.cli-indexer.yml" config that spins up an Indexer instead of a MiddleManager. 2) Introduce a "USE_INDEXER" environment variable that determines if docker-compose will start up an Indexer or a MiddleManager. 3) Duplicate all the jdk8 tests and run them in both MiddleManager and Indexer mode. 4) Various adjustments to encourage fail-fast errors in the Docker build scripts. 5) Various adjustments to speed up integration tests and reduce memory usage. 6) Add another Mac-specific approach to determining a machine's own IP. This was useful on my development machine. 7) Update segment-count check in ITCompactionTaskTest to eliminate a race condition (it was looking for 6 segments, which only exist together briefly, until the older 4 are marked unused). Javadoc updates: 1) AbstractBatchIndexTask: Added javadocs to determineLockGranularityXXX that make it clear when taskLockHelper will be initialized as a side effect. (Related to the second bug above.) 2) Task: Clarified that "isReady" is not guaranteed to be called before "run". It was already implied, but now it's explicit. 3) ZkCoordinator: Clarified deprecation message. 4) DataSegmentServerAnnouncer: Clarified deprecation message. * Fix stop_cluster script. * Fix sanity check in script. * Fix hashbang lines. * Test and doc adjustments. * Additional tests, and adjustments for tests. * Split ITs back out. * Revert change to druid_coordinator_period_indexingPeriod. * Set Indexer capacity to match MM. * Bump up Historical memory. * Bump down coordinator, overlord memory. * Bump up Broker memory.	2020-12-08 16:02:26 -08:00
Gian Merlino	b681861f05	Speed up integration tests in two ways. (#10648 ) 1) Accelerate coordinator runs to speed up segment load after publishing. 2) For streaming ingestion tests, Instead of waiting 3 minutes for data to load, wait until the expected number of rows is loaded. Also updates segment-count check in ITCompactionTaskTest to eliminate a race condition (it was looking for 6 segments, which only exist together briefly, until the older 4 are marked unused).	2020-12-07 10:59:29 -08:00
Jihoon Son	8657b23ab2	Integration tests and docs for auto compaction with different partitioning (#10354 ) * Working * add test * doc * fix test * split other integration test * exclude other-index from other tests * doc anchor fix * adjust task slots and number of merge tasks * spell check * reduce maxNumConcurrentSubTasks to 1 * maxNumConcurrentSubtasks for range partitinoing * reduce memory for historical * change group name	2020-09-15 11:28:09 -07:00
Gian Merlino	21703d81ac	Fix handling of 'join' on top of 'union' datasources. (#10318 ) * Fix handling of 'join' on top of 'union' datasources. The problem is that unions are typically rewritten into a series of individual queries on the underlying tables, but this isn't done when the union is wrapped in a join. The main changes are in UnionQueryRunner: 1) Replace an instanceof UnionQueryRunner check with DataSourceAnalysis. 2) Replace a "query.withDataSource" call with a new function, "Queries.withBaseDataSource". Together, these enable UnionQueryRunner to "see through" a join. * Tests. * Adjust heap sizes for integration tests. * Different approach, more tests. * Tweak. * Styling.	2020-08-26 14:23:54 -07:00
Clint Wylie	7620b0c54e	Segment backed broadcast join IndexedTable (#10224 ) * Segment backed broadcast join IndexedTable * fix comments * fix tests * sharing is caring * fix test * i hope this doesnt fix it * filter by schema to maybe fix test * changes * close join stuffs so it does not leak, allow table to directly make selector factory * oops * update comment * review stuffs * better check	2020-08-20 14:12:39 -07:00
Clint Wylie	e053348f74	add hasNulls to ColumnCapabilities, ColumnAnalysis (#10219 ) * add isNullable to ColumnCapabilities, ColumnAnalysis * better builder * fix segment metadata queries in integration tests * adjustments * cleanup * fix spotbugs * treat unknown as true in segmentmetadata * rename to hasNulls, add docs * fixup * test the dim indexer selector isNull fix for numeric columns * fixes * oof	2020-08-13 14:55:32 -07:00
Atul Mohan	06539bc828	Set default server.maxsize to the sum of segment cache (#10255 ) * Default server.maxsize * Remove maxsize refs from config Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>	2020-08-10 09:21:22 -07:00
Jihoon Son	6fdce36e41	Add integration tests for query retry on missing segments (#10171 ) * Add integration tests for query retry on missing segments * add missing dependencies; fix travis conf * address comments * Integration tests extension * remove unused dependency * remove druid_main * fix java agent port	2020-07-22 22:30:35 -07:00
Maytas Monsereenusorn	dd7a32ad48	Fix ITSqlInputSourceTest (#10194 ) * Fix ITSqlInputSourceTest.java * Fix ITSqlInputSourceTest.java * Fix ITSqlInputSourceTest.java * fix * fix * fix * fix * fix * fix * fix * fix	2020-07-21 09:52:13 -07:00
Maytas Monsereenusorn	859ff6e9c0	Reduce memory footprint of integration test by not starting unneeded containers (#10150 ) * Reduce memory footprint of integration test * fix README * fix README * fix error in script * fix security IT	2020-07-08 09:46:18 -07:00

1 2 3

107 Commits