druid/web-console
zachjsh 9d4e8053a4
Kinesis adaptive memory management (#15360)
### Description

Our Kinesis consumer works by using the [GetRecords API](https://docs.aws.amazon.com/kinesis/latest/APIReference/API_GetRecords.html) in some number of `fetchThreads`, each fetching some number of records (`recordsPerFetch`) and each inserting into a shared buffer that can hold a `recordBufferSize` number of records. The logic is described in our documentation at: https://druid.apache.org/docs/27.0.0/development/extensions-core/kinesis-ingestion/#determine-fetch-settings 

There is a problem with the logic that this pr fixes: the memory limits rely on a hard-coded “estimated record size” that is `10 KB` if `deaggregate: false` and `1 MB` if `deaggregate: true`. There have been cases where a supervisor had `deaggregate: true` set even though it wasn’t needed, leading to under-utilization of memory and poor ingestion performance.

Users don’t always know if their records are aggregated or not. Also, even if they could figure it out, it’s better to not have to. So we’d like to eliminate the `deaggregate` parameter, which means we need to do memory management more adaptively based on the actual record sizes.

We take advantage of the fact that GetRecords doesn’t return more than 10MB (https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html ):

This pr: 

eliminates `recordsPerFetch`, always use the max limit of 10000 records (the default limit if not set)

eliminate `deaggregate`, always have it true

cap `fetchThreads` to ensure that if each fetch returns the max (`10MB`) then we don't exceed our budget (`100MB` or `5% of heap`). In practice this means `fetchThreads` will never be more than `10`. Tasks usually don't have that many processors available to them anyway, so in practice I don't think this will change the number of threads for too many deployments

add `recordBufferSizeBytes` as a bytes-based limit rather than records-based limit for the shared queue. We do know the byte size of kinesis records by at this point. Default should be `100MB` or `10% of heap`, whichever is smaller.

add `maxBytesPerPoll` as a bytes-based limit for how much data we poll from shared buffer at a time. Default is `1000000` bytes.

deprecate `recordBufferSize`, use `recordBufferSizeBytes` instead. Warning is logged if `recordBufferSize` is specified

deprecate `maxRecordsPerPoll`, use `maxBytesPerPoll` instead. Warning is logged if maxRecordsPerPoll` is specified

Fixed issue that when the record buffer is full, the fetchRecords logic throws away the rest of the GetRecords result after `recordBufferOfferTimeout` and starts a new shard iterator. This seems excessively churny. Instead,  wait an unbounded amount of time for queue to stop being full. If the queue remains full, we’ll end up right back waiting for it after the restarted fetch.

There was also a call to `newQ::offer` without check in `filterBufferAndResetBackgroundFetch`, which seemed like it could cause data loss. Now checking return value here, and failing if false.

### Release Note

Kinesis ingestion memory tuning config has been greatly simplified, and a more adaptive approach is now taken for the configuration. Here is a summary of the changes made:

eliminates `recordsPerFetch`, always use the max limit of 10000 records (the default limit if not set)

eliminate `deaggregate`, always have it true

cap `fetchThreads` to ensure that if each fetch returns the max (`10MB`) then we don't exceed our budget (`100MB` or `5% of heap`). In practice this means `fetchThreads` will never be more than `10`. Tasks usually don't have that many processors available to them anyway, so in practice I don't think this will change the number of threads for too many deployments

add `recordBufferSizeBytes` as a bytes-based limit rather than records-based limit for the shared queue. We do know the byte size of kinesis records by at this point. Default should be `100MB` or `10% of heap`, whichever is smaller.

add `maxBytesPerPoll` as a bytes-based limit for how much data we poll from shared buffer at a time. Default is `1000000` bytes.

deprecate `recordBufferSize`, use `recordBufferSizeBytes` instead. Warning is logged if `recordBufferSize` is specified

deprecate `maxRecordsPerPoll`, use `maxBytesPerPoll` instead. Warning is logged if maxRecordsPerPoll` is specified
2024-01-19 14:30:21 -05:00
..
assets Web console: add tile for Azure Event Hubs (via Kafka API) (#10317) 2020-08-31 20:58:52 -07:00
e2e-tests Web console: Log out any request errors in e2e tests for better CI debugging (#15483) 2023-12-05 14:23:47 -08:00
lib Web console: Data loader should allow for multiline JSON messages in kafka (#13709) 2023-01-25 21:23:18 -08:00
script Web console: add explore view (#14602) 2023-07-21 11:19:23 +05:30
src Kinesis adaptive memory management (#15360) 2024-01-19 14:30:21 -05:00
typings Upgrade typescript and other dependencies (#13762) 2023-02-06 23:12:54 -08:00
.editorconfig Web console: update dev dependencies (#11119) 2021-04-16 20:15:19 -07:00
.eslintrc.js Upgrade typescript and other dependencies (#13762) 2023-02-06 23:12:54 -08:00
.gitignore Web console: show segment sizes in rows not bytes (#10496) 2020-10-13 13:19:39 -07:00
.npmrc Upgrades the React dependency to v18 (#14380) 2023-06-09 12:09:13 -07:00
.stylelintrc.json Web console: Switch to ESLint (#11142) 2021-04-22 19:33:03 -07:00
README.md Web console: Misc table fixes (#12489) 2022-05-03 12:08:08 -07:00
babel.config.js Web console: Remove support for IE11 and other older browsers (#11357) 2021-06-10 19:05:40 -07:00
console-config.js Web console: Switch to ESLint (#11142) 2021-04-22 19:33:03 -07:00
favicon.png Web console: refresh and tighten up the console styles 💅💫 (#10515) 2020-10-20 22:11:29 -07:00
jest.common.config.js Web console: Switch to ESLint (#11142) 2021-04-22 19:33:03 -07:00
jest.e2e.config.js Web console: update dev dependencies (#12240) 2022-02-08 16:37:36 -08:00
jest.unit.config.js Upgrade typescript and other dependencies (#13762) 2023-02-06 23:12:54 -08:00
package-lock.json Web console: Update webpack-dev-server v3 to v4 (#15555) 2023-12-13 16:16:54 -08:00
package.json Web console: Update webpack-dev-server v3 to v4 (#15555) 2023-12-13 16:16:54 -08:00
pom.xml Update com.github.eirslett to fix bad zip issue (#15556) 2023-12-13 17:22:54 -08:00
tsconfig.json Upgrade typescript and other dependencies (#13762) 2023-02-06 23:12:54 -08:00
tsconfig.test.json Web console: update dev dependencies (#11119) 2021-04-16 20:15:19 -07:00
unified-console.html Prepare master for Druid 29 (#15121) 2023-10-11 10:33:45 +05:30
webpack.config.js Web console: Update webpack-dev-server v3 to v4 (#15555) 2023-12-13 16:16:54 -08:00

README.md

Apache Druid web console

This is the Druid web console that servers as a data management interface for Druid.

Developing the console

Getting started

  1. You need to be within the web-console directory
  2. Install the modules with npm install
  3. Run npm run compile to compile the scss files (this usually needs to be done only once)
  4. Run npm start will start in development mode and will proxy druid requests to localhost:8888

Note: you can provide an environment variable to proxy to a different Druid host like so: druid_host=1.2.3.4:8888 npm start Note: you can provide an environment variable use webpack-bundle-analyzer as a plugin in the build script or like so: BUNDLE_ANALYZER_PLUGIN='TRUE' npm start

To try the console in (say) coordinator mode you could run it as such:

druid_host=localhost:8081 npm start

Developing

You should use a TypeScript friendly IDE (such as WebStorm, or VS Code) to develop the web console.

The console relies on eslint (and various plugins), sass-lint, and prettier to enforce code style. If you are going to do any non-trivial development you should set up your IDE to automatically lint and fix your code as you make changes.

Configuring WebStorm

  • Preferences | Languages & Frameworks | JavaScript | Code Quality Tools | ESLint

    • Select "Automatic ESLint Configuration"
    • Check "Run eslint --fix on save"
  • Preferences | Languages & Frameworks | JavaScript | Prettier

    • Set "Run for files" to {**/*,*}.{js,ts,jsx,tsx,css,scss}
    • Check "On code reformat"
    • Check "On save"

Configuring VS Code

  • Install dbaeumer.vscode-eslint extension
  • Install esbenp.prettier-vscode extension
  • Open User Settings (JSON) and set the following:
      "editor.defaultFormatter": "esbenp.prettier-vscode",
      "editor.formatOnSave": true,
      "editor.codeActionsOnSave": {
        "source.fixAll.eslint": true
      }
    

Auto-fixing manually

It is also possible to auto-fix and format code without making IDE changes by running the following script:

  • npm run autofix — run code linters and formatter

You could also run fixers individually:

  • npm run eslint-fix — run code linter and fix issues
  • npm run sasslint-fix — run style linter and fix issues
  • npm run prettify — reformat code and styles

Updating the list of license files

If you change the dependencies of the console in any way please run script/licenses (from the web-console directory). It will analyze the changes and update the ../licenses file as needed.

Please be conscious of not introducing dependencies on packages with Apache incompatible licenses.

Running end-to-end tests

From the web-console directory:

  1. Build druid distribution: script/druid build
  2. Start druid cluster: script/druid start
  3. Run end-to-end tests: npm run test-e2e
  4. Stop druid cluster: script/druid stop

If you already have a druid cluster running on the standard ports, the steps to build/start/stop a druid cluster can be skipped.

Screenshots for debugging

e2e-tests/util/debug.ts:saveScreenshotIfError() is used to save a screenshot of the web console when the test fails. For example, if e2e-tests/tutorial-batch.spec.ts fails, it will create load-data-from-local-disk-error-screenshot.png.

Disabling headless mode

Disabling headless mode while running the tests can be helpful. This can be done via the DRUID_E2E_TEST_HEADLESS environment variable, which defaults to true.

Like so: DRUID_E2E_TEST_HEADLESS=false npm run test-e2e

Running against alternate web console

The environment variable DRUID_E2E_TEST_UNIFIED_CONSOLE_PORT can be used to target a web console running on a non-default port (i.e., not port 8888). For example, this environment variable can be used to target the development mode of the web console (started via npm start), which runs on port 18081.

Like so: DRUID_E2E_TEST_UNIFIED_CONSOLE_PORT=18081 npm run test-e2e

Running and debugging a single e2e test using Jest and Playwright

  • Run - jest --config jest.e2e.config.js e2e-tests/tutorial-batch.spec.ts
  • Debug - PWDEBUG=console jest --config jest.e2e.config.js e2e-tests/tutorial-batch.spec.ts

Description of the directory structure

As part of this directory:

  • assets/ - The images (and other assets) used within the console
  • e2e-tests/ - End-to-end tests for the console
  • lib/ - A place where keywords and generated docs live.
  • public/ - The compiled destination for the files powering this console
  • script/ - Some helper bash scripts for running this console
  • src/ - This directory (together with lib) constitutes all the source code for this console

List of non SQL data reading APIs used

GET /status
GET /druid/indexer/v1/supervisor?full
POST /druid/indexer/v1/worker
GET /druid/indexer/v1/workers
GET /druid/indexer/v1/tasks
GET /druid/coordinator/v1/loadqueue?simple
GET /druid/coordinator/v1/config
GET /druid/coordinator/v1/metadata/datasources?includeUnused
GET /druid/coordinator/v1/rules
GET /druid/coordinator/v1/config/compaction
GET /druid/coordinator/v1/tiers