History

zachjsh 9d4e8053a4 Kinesis adaptive memory management (#15360 ) ### Description Our Kinesis consumer works by using the [GetRecords API](https://docs.aws.amazon.com/kinesis/latest/APIReference/API_GetRecords.html) in some number of `fetchThreads`, each fetching some number of records (`recordsPerFetch`) and each inserting into a shared buffer that can hold a `recordBufferSize` number of records. The logic is described in our documentation at: https://druid.apache.org/docs/27.0.0/development/extensions-core/kinesis-ingestion/#determine-fetch-settings There is a problem with the logic that this pr fixes: the memory limits rely on a hard-coded “estimated record size” that is `10 KB` if `deaggregate: false` and `1 MB` if `deaggregate: true`. There have been cases where a supervisor had `deaggregate: true` set even though it wasn’t needed, leading to under-utilization of memory and poor ingestion performance. Users don’t always know if their records are aggregated or not. Also, even if they could figure it out, it’s better to not have to. So we’d like to eliminate the `deaggregate` parameter, which means we need to do memory management more adaptively based on the actual record sizes. We take advantage of the fact that GetRecords doesn’t return more than 10MB (https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html ): This pr: eliminates `recordsPerFetch`, always use the max limit of 10000 records (the default limit if not set) eliminate `deaggregate`, always have it true cap `fetchThreads` to ensure that if each fetch returns the max (`10MB`) then we don't exceed our budget (`100MB` or `5% of heap`). In practice this means `fetchThreads` will never be more than `10`. Tasks usually don't have that many processors available to them anyway, so in practice I don't think this will change the number of threads for too many deployments add `recordBufferSizeBytes` as a bytes-based limit rather than records-based limit for the shared queue. We do know the byte size of kinesis records by at this point. Default should be `100MB` or `10% of heap`, whichever is smaller. add `maxBytesPerPoll` as a bytes-based limit for how much data we poll from shared buffer at a time. Default is `1000000` bytes. deprecate `recordBufferSize`, use `recordBufferSizeBytes` instead. Warning is logged if `recordBufferSize` is specified deprecate `maxRecordsPerPoll`, use `maxBytesPerPoll` instead. Warning is logged if maxRecordsPerPoll` is specified Fixed issue that when the record buffer is full, the fetchRecords logic throws away the rest of the GetRecords result after `recordBufferOfferTimeout` and starts a new shard iterator. This seems excessively churny. Instead, wait an unbounded amount of time for queue to stop being full. If the queue remains full, we’ll end up right back waiting for it after the restarted fetch. There was also a call to `newQ::offer` without check in `filterBufferAndResetBackgroundFetch`, which seemed like it could cause data loss. Now checking return value here, and failing if false. ### Release Note Kinesis ingestion memory tuning config has been greatly simplified, and a more adaptive approach is now taken for the configuration. Here is a summary of the changes made: eliminates `recordsPerFetch`, always use the max limit of 10000 records (the default limit if not set) eliminate `deaggregate`, always have it true cap `fetchThreads` to ensure that if each fetch returns the max (`10MB`) then we don't exceed our budget (`100MB` or `5% of heap`). In practice this means `fetchThreads` will never be more than `10`. Tasks usually don't have that many processors available to them anyway, so in practice I don't think this will change the number of threads for too many deployments add `recordBufferSizeBytes` as a bytes-based limit rather than records-based limit for the shared queue. We do know the byte size of kinesis records by at this point. Default should be `100MB` or `10% of heap`, whichever is smaller. add `maxBytesPerPoll` as a bytes-based limit for how much data we poll from shared buffer at a time. Default is `1000000` bytes. deprecate `recordBufferSize`, use `recordBufferSizeBytes` instead. Warning is logged if `recordBufferSize` is specified deprecate `maxRecordsPerPoll`, use `maxBytesPerPoll` instead. Warning is logged if maxRecordsPerPoll` is specified		2024-01-19 14:30:21 -05:00
..
assets	Web console: add tile for Azure Event Hubs (via Kafka API) (#10317 )	2020-08-31 20:58:52 -07:00
e2e-tests	Web console: Log out any request errors in e2e tests for better CI debugging (#15483 )	2023-12-05 14:23:47 -08:00
lib	Web console: Data loader should allow for multiline JSON messages in kafka (#13709 )	2023-01-25 21:23:18 -08:00
script	Web console: add explore view (#14602 )	2023-07-21 11:19:23 +05:30
src	Kinesis adaptive memory management (#15360 )	2024-01-19 14:30:21 -05:00
typings	Upgrade typescript and other dependencies (#13762 )	2023-02-06 23:12:54 -08:00
.editorconfig	Web console: update dev dependencies (#11119 )	2021-04-16 20:15:19 -07:00
.eslintrc.js	Upgrade typescript and other dependencies (#13762 )	2023-02-06 23:12:54 -08:00
.gitignore	Web console: show segment sizes in rows not bytes (#10496 )	2020-10-13 13:19:39 -07:00
.npmrc	Upgrades the React dependency to v18 (#14380 )	2023-06-09 12:09:13 -07:00
.stylelintrc.json	Web console: Switch to ESLint (#11142 )	2021-04-22 19:33:03 -07:00
README.md	Web console: Misc table fixes (#12489 )	2022-05-03 12:08:08 -07:00
babel.config.js	Web console: Remove support for IE11 and other older browsers (#11357 )	2021-06-10 19:05:40 -07:00
console-config.js	Web console: Switch to ESLint (#11142 )	2021-04-22 19:33:03 -07:00
favicon.png	Web console: refresh and tighten up the console styles ✨💅💫 (#10515 )	2020-10-20 22:11:29 -07:00
jest.common.config.js	Web console: Switch to ESLint (#11142 )	2021-04-22 19:33:03 -07:00
jest.e2e.config.js	Web console: update dev dependencies (#12240 )	2022-02-08 16:37:36 -08:00
jest.unit.config.js	Upgrade typescript and other dependencies (#13762 )	2023-02-06 23:12:54 -08:00
package-lock.json	Web console: Update webpack-dev-server v3 to v4 (#15555 )	2023-12-13 16:16:54 -08:00
package.json	Web console: Update webpack-dev-server v3 to v4 (#15555 )	2023-12-13 16:16:54 -08:00
pom.xml	Update com.github.eirslett to fix bad zip issue (#15556 )	2023-12-13 17:22:54 -08:00
tsconfig.json	Upgrade typescript and other dependencies (#13762 )	2023-02-06 23:12:54 -08:00
tsconfig.test.json	Web console: update dev dependencies (#11119 )	2021-04-16 20:15:19 -07:00
unified-console.html	Prepare master for Druid 29 (#15121 )	2023-10-11 10:33:45 +05:30
webpack.config.js	Web console: Update webpack-dev-server v3 to v4 (#15555 )	2023-12-13 16:16:54 -08:00

README.md

Apache Druid web console

This is the Druid web console that servers as a data management interface for Druid.

Developing the console

Getting started

You need to be within the web-console directory
Install the modules with npm install
Run npm run compile to compile the scss files (this usually needs to be done only once)
Run npm start will start in development mode and will proxy druid requests to localhost:8888

Note: you can provide an environment variable to proxy to a different Druid host like so: druid_host=1.2.3.4:8888 npm start Note: you can provide an environment variable use webpack-bundle-analyzer as a plugin in the build script or like so: BUNDLE_ANALYZER_PLUGIN='TRUE' npm start

To try the console in (say) coordinator mode you could run it as such:

druid_host=localhost:8081 npm start

Developing

You should use a TypeScript friendly IDE (such as WebStorm, or VS Code) to develop the web console.

The console relies on eslint (and various plugins), sass-lint, and prettier to enforce code style. If you are going to do any non-trivial development you should set up your IDE to automatically lint and fix your code as you make changes.

Configuring WebStorm

Preferences | Languages & Frameworks | JavaScript | Code Quality Tools | ESLint
- Select "Automatic ESLint Configuration"
- Check "Run eslint --fix on save"
Preferences | Languages & Frameworks | JavaScript | Prettier
- Set "Run for files" to {**/*,*}.{js,ts,jsx,tsx,css,scss}
- Check "On code reformat"
- Check "On save"

Configuring VS Code

Install dbaeumer.vscode-eslint extension
Install esbenp.prettier-vscode extension

Open User Settings (JSON) and set the following:

  "editor.defaultFormatter": "esbenp.prettier-vscode",
  "editor.formatOnSave": true,
  "editor.codeActionsOnSave": {
    "source.fixAll.eslint": true
  }

Auto-fixing manually

It is also possible to auto-fix and format code without making IDE changes by running the following script:

npm run autofix — run code linters and formatter

You could also run fixers individually:

npm run eslint-fix — run code linter and fix issues
npm run sasslint-fix — run style linter and fix issues
npm run prettify — reformat code and styles

Updating the list of license files

If you change the dependencies of the console in any way please run script/licenses (from the web-console directory). It will analyze the changes and update the ../licenses file as needed.

Please be conscious of not introducing dependencies on packages with Apache incompatible licenses.

Running end-to-end tests

From the web-console directory:

Build druid distribution: script/druid build
Start druid cluster: script/druid start
Run end-to-end tests: npm run test-e2e
Stop druid cluster: script/druid stop

If you already have a druid cluster running on the standard ports, the steps to build/start/stop a druid cluster can be skipped.

Screenshots for debugging

e2e-tests/util/debug.ts:saveScreenshotIfError() is used to save a screenshot of the web console when the test fails. For example, if e2e-tests/tutorial-batch.spec.ts fails, it will create load-data-from-local-disk-error-screenshot.png.

Disabling headless mode

Disabling headless mode while running the tests can be helpful. This can be done via the DRUID_E2E_TEST_HEADLESS environment variable, which defaults to true.

Like so: DRUID_E2E_TEST_HEADLESS=false npm run test-e2e

Running against alternate web console

The environment variable DRUID_E2E_TEST_UNIFIED_CONSOLE_PORT can be used to target a web console running on a non-default port (i.e., not port 8888). For example, this environment variable can be used to target the development mode of the web console (started via npm start), which runs on port 18081.

Like so: DRUID_E2E_TEST_UNIFIED_CONSOLE_PORT=18081 npm run test-e2e

Running and debugging a single e2e test using Jest and Playwright

Run - jest --config jest.e2e.config.js e2e-tests/tutorial-batch.spec.ts
Debug - PWDEBUG=console jest --config jest.e2e.config.js e2e-tests/tutorial-batch.spec.ts

Description of the directory structure

As part of this directory:

assets/ - The images (and other assets) used within the console
e2e-tests/ - End-to-end tests for the console
lib/ - A place where keywords and generated docs live.
public/ - The compiled destination for the files powering this console
script/ - Some helper bash scripts for running this console
src/ - This directory (together with lib) constitutes all the source code for this console

List of non SQL data reading APIs used

GET /status
GET /druid/indexer/v1/supervisor?full
POST /druid/indexer/v1/worker
GET /druid/indexer/v1/workers
GET /druid/indexer/v1/tasks
GET /druid/coordinator/v1/loadqueue?simple
GET /druid/coordinator/v1/config
GET /druid/coordinator/v1/metadata/datasources?includeUnused
GET /druid/coordinator/v1/rules
GET /druid/coordinator/v1/config/compaction
GET /druid/coordinator/v1/tiers