druid

Commit Graph

Author	SHA1	Message	Date
Maytas Monsereenusorn	84aac4832d	Add feature to automatically remove rules based on retention period (#11164 ) * Add feature to automatically remove rules based on retention period * Add feature to automatically remove rules based on retention period * address comments	2021-05-03 11:50:45 -07:00
Maytas Monsereenusorn	6d2b5cdd7e	Add feature to automatically remove audit logs based on retention period (#11084 ) * add docs * add impl * fix checkstyle * fix test * add test * fix checkstyle * fix checkstyle * fix test * Address comments * Address comments * fix spelling * fix docs	2021-04-20 17:10:43 -07:00
Charles Smith	09dcf6aa36	fix syntax error for loadstatus api (#11136 )	2021-04-20 14:17:20 +08:00
Charles Smith	b51632b0bf	Update security overview with additional recommendations (#11016 ) * updatee security overview with additional recommendations for improved security * address first set of review questions * Update docs/operations/security-overview.md * Update docs/operations/security-overview.md * apply changes from review * Update docs/operations/security-overview.md Co-authored-by: Suneet Saldanha <suneet@apache.org> * Update docs/operations/security-overview.md Co-authored-by: Suneet Saldanha <suneet@apache.org> * Update docs/operations/security-overview.md Co-authored-by: Suneet Saldanha <suneet@apache.org> * Update security-overview.md fix additional comments & typos cc: @suneet-s, @jihoonsoon Co-authored-by: Suneet Saldanha <suneet@apache.org>	2021-04-14 08:58:17 -07:00
zhangyue19921010	95b82dd325	Add missing API references for coordinator (#10967 ) * add miss API references for coordinator * add miss API references for coordinator * add miss API references for coordinator Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-04-09 18:20:47 -07:00
sthetland	fb6751fa45	Fix old broken link (#11048 ) * link check fixes * updated link target * Update aggregations.md * spelling error	2021-04-07 20:40:50 -07:00
zachjsh	8cf1e83543	Add paramter to loadstatus API to compute underdeplication against cluster view (#11056 ) * Add paramter to loadstatus API to compute underdeplication against cluster view This change adds a query parameter `computeUsingClusterView` to loadstatus apis that if specified have the coordinator compute undereplication for segments based on the number of services available within cluster that the segment can be replicated on, instead of the configured replication count configured in load rule. A default load rule is created in all clusters that specified that all segments should be replicated 2 times. As replicas are forced to be on separate nodes in the cluster, this causes the loadstatus api to report that there are under-replicated segments when there is only 1 data server in the cluster. In this case, calling loadstatus api without this new query parameter will always result in a response indicating under-replication of segments * * fix exception mapper * * Address review comments * * update external API docs * Apply suggestions from code review Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> * * update more external docs * * update javadoc * Apply suggestions from code review Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>	2021-04-05 00:02:43 -04:00
Clint Wylie	470d659ca0	add documentation for coordinator dynamic configuration (#11052 )	2021-04-02 22:01:43 -07:00
Tushar Raj	6789ed0a05	Update reset-cluster.md (#10990 ) fixed Error: Could not find or load main class org.apache.druid.cli.Main	2021-03-29 20:38:35 -07:00
Charles Smith	d69533dbd9	First refactor of compaction (#10935 ) * first pass compaction refactor. includes updated behavior for queryGranularity. removes duplicated doc * fix links, typos, some reorganization * fix spelling. TBD still there for work in progress * updates tutorial examples, adds more clarification around compaction use cases * add granularity spec to automatic compaction config * final edits * spelling fixes * apply suggestions from review * upadtes from review * last edits * move note * clarify null * fix links & spelling * latest review * edits to auto-compaction config * add back rollup * fix links & spelling * Update compaction.md add granularityspec to example	2021-03-24 11:41:44 -07:00
Charles Smith	573de3bc0d	clarify security requirements around HTTPInputSource (#10914 ) * clarify security requirements around HTTPInputSource * explicitly mention write/datasource in best practices. clarify that the ingestion task is the risk * Update docs/operations/security-overview.md Co-authored-by: Suneet Saldanha <suneet@apache.org> Co-authored-by: Suneet Saldanha <suneet@apache.org>	2021-02-26 09:37:47 -08:00
zachjsh	67eff4110d	Improve Druid ldap auth documentation (#10915 ) * Improve Druid ldap auth documentation Improved the ldap auth docs by clarifying that the object classes and attributes noted are specific to Microsoft Active Directory, and could be different depending on the specific ldap server being used. Also emphasized the importance of the memberOf field and noted that the step about adding users to roles is only needed in certain circumstances. * * add another note * Apply suggestions from code review Co-authored-by: sthetland <steve.hetland@imply.io> * * simplify * * Address review comments Co-authored-by: sthetland <steve.hetland@imply.io>	2021-02-24 15:28:41 -08:00
sthetland	1e40f51e65	Fix example names of security artifacts in docs (#10882 ) * replacing example names * unrelated typos * unintended changes * a few more typo fixes	2021-02-16 14:58:50 -08:00
Lucas Capistrant	58ce2e55d8	Add dynamic coordinator config that allows control over how many segments are considered when picking a segment to move. (#10284 ) * dynamic coord config adding more balancing control add new dynamic coordinator config, maxSegmentsToConsiderPerMove. This config caps the number of segments that are iterated over when selecting a segment to move. The default value combined with current balancing strategies will still iterate over all provided segments. However, setting this value to something > 0 will cap the number of segments visited. This could make sense in cases where a cluster has a very large number of segments and the admins prefer less iterations vs a thorough consideration of all segments provided. * fix checkstyle failure * Make doc more detailed for admin to understand when/why to use new config * refactor PR to use a % of segments instead of raw number * update the docs * remove bad doc line * fix typo in name of new dynamic config * update RservoirSegmentSampler to gracefully deal with values > 100% * add handler for <= 0 in ReservoirSegmentSampler * fixup CoordinatorDynamicConfigTest naming and argument ordering * fix items in docs after spellcheck flags * Fix lgtm flag on missing space in string literal * improve documentation for new config * Add default value to config docs and add advice in cluster tuning doc * Add percentOfSegmentsToConsiderPerMove to web console coord config dialog * update jest snapshot after console change * fix spell checker errors * Improve debug logging in getRandomSegmentBalancerHolder to cover all bad inputs for % of segments to consider * add new config back to web console module after merge with master * fix ReservoirSegmentSamplerTest * fix line breaks in coordinator console dialog * Add a test that helps ensure not regressions for percentOfSegmentsToConsiderPerMove * Make improvements based off of feedback in review * additional cleanup coming from review * Add a warning log if limit on segments to consider for move can't be calcluated * remove unused import * fix tests for CoordinatorDynamicConfig * remove precondition test that is redundant in CoordinatorDynamicConfig Builder class	2020-12-22 08:27:55 -08:00
sthetland	6ae8059c09	cleaning up and fixing links (#10528 ) * cleaning up and fixing links * reverting local link * Update indexer.md * link checking * Fixing one more stale link for PostgreSQL	2020-12-17 13:37:43 -08:00
Himanshu	ac1882bf74	kubernetes based discovery druid extension to run Druid on K8S without Zookeeper (#10544 ) * honor zk enablement config in more places in druid code * kubernetes based discovery module * fix spotbugs check * fix intellij checks error * fix doc link to kubernetes.md from extension * make spellchecker happy * update license.yaml * fix dependency check errors * update extension coverage * UTs for BaseNodeRoleWatcher * fix forbidden-api check * update k8s module coverage ignores * add Bouncy Castle License being same as MIT License for license checking purposes * further update licenses.yaml * label/annotation pre-existence assumption * address review comment	2020-12-14 21:10:31 -08:00
Himanshu	be019760bb	document DynamicConfigProvider for kafka consumer properties (#10658 ) * document DynamicConfigProvider for kafka consumer properties * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: Jihoon Son <jihoonson@apache.org> * Update docs/development/extensions-core/kafka-ingestion.md * fix doc build Co-authored-by: Jihoon Son <jihoonson@apache.org>	2020-12-10 08:24:33 -08:00
Lucas Capistrant	2560bf0a19	Add new coordinator metrics for coordinator duty runtimes (#10603 ) * Add new coordinator metrics for duty runtimes * fix spelling for a constant variable value * add comment clarifying why the global runtime metric is emitted where it is * Remove duty alias in lieu of using the class name for metrics * fix docs * CoordinatorStats tests + add duty stats to accumulate() logic	2020-11-29 14:47:35 -08:00
Atul Mohan	111b431c07	Introduce query/timeout/count metric (#10567 ) * Add timeout metric * Add tests	2020-11-20 15:17:26 -08:00
sthetland	ba915b7f56	Security overview documentation (#10339 ) * initial file * initial file * security overview added * ldap added * spacing adjustments * nits * security graphics and doc review * Update docs/operations/security-overview.md Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com> * Update docs/operations/security-user-auth.md Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com> * Update docs/operations/security-overview.md Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com> * Update docs/operations/security-overview.md Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com> * updates frm review * review comments * finish up review and light edits * broken links * spell check Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>	2020-11-19 15:24:58 -08:00
Atul Mohan	65a42f9eb1	Update overlord api docs (#10539 )	2020-10-29 11:19:12 -05:00
Maytas Monsereenusorn	9056d113d0	Add docs and integration tests for Auto-compaction snapshot status API (#10510 ) * add docs and IT for Auto-compaction snapshot status API * fix spellings * fix test * address comments	2020-10-14 06:42:22 -07:00
Jihoon Son	ad437dd655	Add shuffle metrics for parallel indexing (#10359 ) * Add shuffle metrics for parallel indexing * javadoc and concurrency test * concurrency * fix javadoc * Feature flag * doc * fix doc and add a test * checkstyle * add tests * fix build and address comments	2020-10-10 19:35:17 -07:00
Mainak Ghosh	8168e14e92	Adding task slot count metrics to Druid Overlord (#10379 ) * Adding more worker metrics to Druid Overlord * Changing the nomenclature from worker to peon as that represents the metrics that we want to monitor better * Few more instance of worker usage replaced with peon * Modifying the peon idle count logic to only use eligible workers available capacity * Changing the naming to task slot count instead of peon * Adding some unit test coverage for the new test runner apis * Addressing Review Comments * Modifying the TaskSlotCountStatsProvider apis so that overlords which are not leader do not emit these metrics * Fixing the spelling issue in the docs * Setting the annotation Nullable on the TaskSlotCountStatsProvider methods	2020-09-28 23:50:38 -07:00
Clint Wylie	b95bf444b2	add docs for kinesis lag metrics (#10435 )	2020-09-28 13:13:53 -07:00
Curt Buechter	e3735602f2	Fix typo (#10385 )	2020-09-11 16:31:36 -07:00
Atul Mohan	06539bc828	Set default server.maxsize to the sum of segment cache (#10255 ) * Default server.maxsize * Remove maxsize refs from config Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>	2020-08-10 09:21:22 -07:00
Clint Wylie	0f51b3c190	fix dropwizard emitter jvm bufferpoolName metric (#10075 ) * fix dropwizard emitter jvm bufferpoolName metric * fixes	2020-06-25 12:20:25 -07:00
sthetland	978b494b46	Druid user permissions (#10047 ) * Druid user permissions apply in the console * Update index.md * noting user warning in console page; some minor shuffling * noting user warning in console page; some minor shuffling 1 * touchups * link checking fixes * Updated per suggestions	2020-06-23 17:39:48 -07:00
Maytas Monsereenusorn	1a2620606d	API to verify a datasource has the latest ingested data (#9965 ) * API to verify a datasource has the latest ingested data * API to verify a datasource has the latest ingested data * API to verify a datasource has the latest ingested data * API to verify a datasource has the latest ingested data * API to verify a datasource has the latest ingested data * fix checksyle * API to verify a datasource has the latest ingested data * API to verify a datasource has the latest ingested data * API to verify a datasource has the latest ingested data * API to verify a datasource has the latest ingested data * fix spelling * address comments * fix checkstyle * update docs * fix tests * fix doc * address comments * fix typo * fix spelling * address comments * address comments * fix typo in docs	2020-06-16 20:48:30 -10:00
Jonathan Wei	fe2f656427	Fix broadcast rule drop and docs (#10019 ) * Fix broadcast rule drop and docs * Remove racy test check * Don't drop non-broadcast segments on tasks, add overshadowing handling * Don't use realtimes for overshadowing * Fix dropping for ingestion services	2020-06-12 02:33:28 -07:00
danc	5da78d13af	Update password-provider.md (#9857 )	2020-06-10 09:32:49 -07:00
Maytas Monsereenusorn	6130a834c2	Update doc on tmp dir (java.io.tmpdir) best practice (#9910 ) * Update doc on tmp dir best practice * remove local recommendation	2020-05-26 09:37:01 -07:00
sthetland	ce03f31a73	Clarifying workerThreads and a few other nits (#9804 ) * Update data-formats.md Per Suneet, "Since you're editing this file can you also fix the json on line 177 please - it's missing a comma after the }" * Light text cleanup * Removing discussion of sample data, since it's repeated in the data loading tutorial, and not immediately relevant here. * Clarifying accepted values for URI lookup * Update index.md * original quickstart full first pass * original quickstart full first pass * first pass all the way through * straggler * image touchups and finished old tutorial * a bit of finishing up * druid-caffeine-cache ext previously removed * Sample MaxDirectMemorySize value unrealistic * Review comments * fixing links * spell checking gymnastics * workerThreads desc slightly expanded * typo * Typo * Reversing Kafka config order * Changing order of configs for Kinesis * Trying this again: ioConfig then tuningConfig	2020-05-06 09:05:18 -07:00
Gian Merlino	42590ae64b	Refresh query docs. (#9704 ) * Refresh query docs. Larger changes: - New doc: querying/datasource.md describes the various kinds of datasources you can use, and has examples for both SQL and native. - New doc: querying/query-execution.md describes how native queries are executed at a high level. It doesn't go into the details of specific query engines or how queries run at a per-segment level. But I think it would be good to add or link that content here in the future. - Refreshed doc: querying/sql.md updated to refer to joins, reformatted a bit, added a new "Query translation" section that explains how queries are translated from SQL to native, and removed configuration details (moved to configuration/index.md). - Refreshed doc: querying/joins.md updated to refer to join datasources. Smaller changes: - Add helpful banners to the top of query documentation pages telling people whether a given page describes SQL, native, or both. - Add SQL metrics to operations/metrics.md. - Add some color and cross-links in various places. - Add native query component docs to the sidebar, and renamed them so they look nicer. - Remove Select query from the sidebar. - Fix Broker SQL configs in configuration/index.md. Remove them from querying/sql.md. - Combined querying/searchquery.md and querying/searchqueryspec.md. * Updates. * Fix numbering. * Fix glitches. * Add new words to spellcheck file. * Assorted changes. * Further adjustments. * Add missing punctuation.	2020-04-15 16:12:20 -07:00
Maytas Monsereenusorn	8328d91b30	Add missing integration tests for the compaction by the coordinator (#9644 ) * Add API to trigger a compaction by the coordinator for integration tests * Add missing integration tests for the compaction by the coordinator * address comments	2020-04-15 14:27:33 -07:00
Clint Wylie	bf85ea19b2	roaring bitmaps by default (#9548 ) * it is finally time * fix it * more docs * fix doc	2020-03-23 18:15:57 -07:00
Roman Leventov	b9186f8f9f	Reconcile terminology and method naming to 'used/unused segments'; Rename MetadataSegmentManager to MetadataSegmentsManager (#7306 ) * Reconcile terminology and method naming to 'used/unused segments'; Don't use terms 'enable/disable data source'; Rename MetadataSegmentManager to MetadataSegments; Make REST API methods which mark segments as used/unused to return server error instead of an empty response in case of error * Fix brace * Import order * Rename withKillDataSourceWhitelist to withSpecificDataSourcesToKill * Fix tests * Fix tests by adding proper methods without interval parameters to IndexerMetadataStorageCoordinator instead of hacking with Intervals.ETERNITY * More aligned names of DruidCoordinatorHelpers, rename several CoordinatorDynamicConfig parameters * Rename ClientCompactTaskQuery to ClientCompactionTaskQuery for consistency with CompactionTask; ClientCompactQueryTuningConfig to ClientCompactionTaskQueryTuningConfig * More variable and method renames * Rename MetadataSegments to SegmentsMetadata * Javadoc update * Simplify SegmentsMetadata.getUnusedSegmentIntervals(), more javadocs * Update Javadoc of VersionedIntervalTimeline.iterateAllObjects() * Reorder imports * Rename SegmentsMetadata.tryMark... methods to mark... and make them to return boolean and the numbers of segments changed and relay exceptions to callers * Complete merge * Add CollectionUtils.newTreeSet(); Refactor DruidCoordinatorRuntimeParams creation in tests * Remove MetadataSegmentManager * Rename millisLagSinceCoordinatorBecomesLeaderBeforeCanMarkAsUnusedOvershadowedSegments to leadingTimeMillisBeforeCanMarkAsUnusedOvershadowedSegments * Fix tests, refactor DruidCluster creation in tests into DruidClusterBuilder * Fix inspections * Fix SQLMetadataSegmentManagerEmptyTest and rename it to SqlSegmentsMetadataEmptyTest * Rename SegmentsAndMetadata to SegmentsAndCommitMetadata to reduce the similarity with SegmentsMetadata; Rename some methods * Rename DruidCoordinatorHelper to CoordinatorDuty, refactor DruidCoordinator * Unused import * Optimize imports * Rename IndexerSQLMetadataStorageCoordinator.getDataSourceMetadata() to retrieveDataSourceMetadata() * Unused import * Update terminology in datasource-view.tsx * Fix label in datasource-view.spec.tsx.snap * Fix lint errors in datasource-view.tsx * Doc improvements * Another attempt to please TSLint * Another attempt to please TSLint * Style fixes * Fix IndexerSQLMetadataStorageCoordinator.createUsedSegmentsSqlQueryForIntervals() (wrong merge) * Try to fix docs build issue * Javadoc and spelling fixes * Rename SegmentsMetadata to SegmentsMetadataManager, address other comments * Address more comments	2020-01-27 11:24:29 -08:00
Gian Merlino	d21054f7c5	Remove the deprecated interval-chunking stuff. (#9216 ) * Remove the deprecated interval-chunking stuff. See https://github.com/apache/druid/pull/6591, https://github.com/apache/druid/pull/4004#issuecomment-284171911 for details. * Remove unused import. * Remove chunkInterval too.	2020-01-19 17:14:23 -08:00
Suneet Saldanha	92ac22d060	Link javaOpts to middlemanager runtime.properties docs (#9101 ) * Link javaOpts to middlemanager runtime.properties docs * fix broken link * reword config links	2020-01-15 21:22:49 -08:00
Jonathan Wei	aa539177ec	De-incubation cleanup in code, docs, packaging (#9108 ) * De-incubation cleanup in code, docs, packaging * remove unused docs script	2020-01-03 12:33:19 -05:00
Jihoon Son	e5e1e9c4ee	Fix broken master (#9005 ) * Multibinding for NodeRole * Fix endpoints * fix doc * fix test	2019-12-11 15:56:36 -08:00
Parag Jain	24fe824055	add readiness endpoints to processes having initialization delays (#8841 )	2019-12-10 17:26:13 -08:00
Roman Leventov	1c62987783	Add SelfDiscoveryResource; rename org.apache.druid.discovery.No… (#6702 ) * Add SelfDiscoveryResource * Rename org.apache.druid.discovery.NodeType to NodeRole. Refactor CuratorDruidNodeDiscoveryProvider. Make SelfDiscoveryResource to listen to updates only about a single node (itself). * Extended docs * Fix brace * Remove redundant throws in Lifecycle.Handler.stop() * Import order * Remove unresolvable link * Address comments * tmp * tmp * Rollback docker changes * Remove extra .sh files * Move filter * Fix SecurityResourceFilterTest	2019-12-08 18:47:58 +03:00
Clint Wylie	441515cb50	update dump-segment docs so example command works (#8998 ) * update dump-segment docs so example command works * not everyone uses bash	2019-12-07 06:36:46 -08:00
Chi Cao Minh	8365bdf62a	Address security vulnerabilities (#8878 ) * Address security vulnerabilities Security vulnerabilities addressed by upgrading 3rd party libs: - Upgrade avro-ipc to 1.9.1 - sonatype-2019-0115 - Upgrade caffeine to 2.8.0 - sonatype-2019-0282 - Upgrade commons-beanutils to 1.9.4 - CVE-2014-0114 - Upgrade commons-codec to 1.13 - sonatype-2012-0050 - Upgrade commons-compress to 1.19 - CVE-2019-12402 - sonatype-2018-0293 - Upgrade hadoop-common to 2.8.5 - CVE-2018-11767 - Upgrade hadoop-mapreduce-client-core to 2.8.5 - CVE-2017-3166 - Upgrade hibernate-validator to 5.2.5 - CVE-2017-7536 - Upgrade httpclient to 4.5.10 - sonatype-2017-0359 - Upgrade icu4j to 55.1 - CVE-2014-8147 - Upgrade jackson-databind to 2.6.7.3: - CVE-2017-7525 - Upgrade jetty-http to 9.4.12: - CVE-2017-7657 - CVE-2017-7658 - CVE-2017-7656 - CVE-2018-12545 - Upgrade log4j-core to 2.8.2 - CVE-2017-5645: - Upgrade netty to 3.10.6 - CVE-2015-2156 - Upgrade netty-common to 4.1.42 - CVE-2019-9518 - Upgrade netty-codec-http to 4.1.42 - CVE-2019-16869 - Upgrade nimbus-jose-jwt to 4.41.1 - CVE-2017-12972 - CVE-2017-12974 - Upgrade plexus-utils to 3.0.24 - CVE-2017-1000487 - sonatype-2015-0173 - sonatype-2016-0398 - Upgrade postgresql to 42.2.8 - CVE-2018-10936 Note that if users are using JDBC lookups with postgres, they may need to update the JDBC jar used by the lookup extension. * Fix license for postgresql	2019-11-19 09:14:33 -08:00
Himanshu	5adc8212b4	add documentation for druid docker and k8s operator (#8802 ) * add documentation for druid docker and k8s operator * address review comment and add Kubernetes to spelling file	2019-11-06 12:56:21 -08:00
Surekha	98f59ddd7e	Add `sys.supervisors` table to system tables (#8547 ) * Add supervisors table to SystemSchema * Add docs * fix checkstyle * fix test * fix CI * Add comments * Fix javadoc teamcity error * comments * fix links in docs * fix links * rename fullStatus query param to system and remove it from docs	2019-10-18 15:16:42 -07:00
Clint Wylie	8bda3afea4	fix spelling errors triggered by another doc PR (#8653 )	2019-10-08 23:43:58 -07:00
Nishant Bangarwa	0853273091	Add tier based usage metrics for historical nodes to help with autoscaling (#8636 ) * Add tier based usage metrics for historical nodes to help with druid historical autoscaling Add tier based usage metrics for historical nodes to help druid cluster orchestration systems understand the historical node usage and requirements. Following metrics would be helpful - tier/required/capacity- total capacity in bytes required in each tier. Dimensions - tier tier/total/capacity - total capacity in bytes available in a given tier. Dimension - tier tier/historical/count - no. of historical nodes available in each tier. Dimension - tier tier/replication/factor - configured maximum replication factor in given tier. Dimension - tier * fix unit test failures	2019-10-08 19:55:32 -07:00

1 2

60 Commits