9d4e8053a4
### Description Our Kinesis consumer works by using the [GetRecords API](https://docs.aws.amazon.com/kinesis/latest/APIReference/API_GetRecords.html) in some number of `fetchThreads`, each fetching some number of records (`recordsPerFetch`) and each inserting into a shared buffer that can hold a `recordBufferSize` number of records. The logic is described in our documentation at: https://druid.apache.org/docs/27.0.0/development/extensions-core/kinesis-ingestion/#determine-fetch-settings There is a problem with the logic that this pr fixes: the memory limits rely on a hard-coded “estimated record size” that is `10 KB` if `deaggregate: false` and `1 MB` if `deaggregate: true`. There have been cases where a supervisor had `deaggregate: true` set even though it wasn’t needed, leading to under-utilization of memory and poor ingestion performance. Users don’t always know if their records are aggregated or not. Also, even if they could figure it out, it’s better to not have to. So we’d like to eliminate the `deaggregate` parameter, which means we need to do memory management more adaptively based on the actual record sizes. We take advantage of the fact that GetRecords doesn’t return more than 10MB (https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html ): This pr: eliminates `recordsPerFetch`, always use the max limit of 10000 records (the default limit if not set) eliminate `deaggregate`, always have it true cap `fetchThreads` to ensure that if each fetch returns the max (`10MB`) then we don't exceed our budget (`100MB` or `5% of heap`). In practice this means `fetchThreads` will never be more than `10`. Tasks usually don't have that many processors available to them anyway, so in practice I don't think this will change the number of threads for too many deployments add `recordBufferSizeBytes` as a bytes-based limit rather than records-based limit for the shared queue. We do know the byte size of kinesis records by at this point. Default should be `100MB` or `10% of heap`, whichever is smaller. add `maxBytesPerPoll` as a bytes-based limit for how much data we poll from shared buffer at a time. Default is `1000000` bytes. deprecate `recordBufferSize`, use `recordBufferSizeBytes` instead. Warning is logged if `recordBufferSize` is specified deprecate `maxRecordsPerPoll`, use `maxBytesPerPoll` instead. Warning is logged if maxRecordsPerPoll` is specified Fixed issue that when the record buffer is full, the fetchRecords logic throws away the rest of the GetRecords result after `recordBufferOfferTimeout` and starts a new shard iterator. This seems excessively churny. Instead, wait an unbounded amount of time for queue to stop being full. If the queue remains full, we’ll end up right back waiting for it after the restarted fetch. There was also a call to `newQ::offer` without check in `filterBufferAndResetBackgroundFetch`, which seemed like it could cause data loss. Now checking return value here, and failing if false. ### Release Note Kinesis ingestion memory tuning config has been greatly simplified, and a more adaptive approach is now taken for the configuration. Here is a summary of the changes made: eliminates `recordsPerFetch`, always use the max limit of 10000 records (the default limit if not set) eliminate `deaggregate`, always have it true cap `fetchThreads` to ensure that if each fetch returns the max (`10MB`) then we don't exceed our budget (`100MB` or `5% of heap`). In practice this means `fetchThreads` will never be more than `10`. Tasks usually don't have that many processors available to them anyway, so in practice I don't think this will change the number of threads for too many deployments add `recordBufferSizeBytes` as a bytes-based limit rather than records-based limit for the shared queue. We do know the byte size of kinesis records by at this point. Default should be `100MB` or `10% of heap`, whichever is smaller. add `maxBytesPerPoll` as a bytes-based limit for how much data we poll from shared buffer at a time. Default is `1000000` bytes. deprecate `recordBufferSize`, use `recordBufferSizeBytes` instead. Warning is logged if `recordBufferSize` is specified deprecate `maxRecordsPerPoll`, use `maxBytesPerPoll` instead. Warning is logged if maxRecordsPerPoll` is specified |
||
---|---|---|
.. | ||
assets | ||
e2e-tests | ||
lib | ||
script | ||
src | ||
typings | ||
.editorconfig | ||
.eslintrc.js | ||
.gitignore | ||
.npmrc | ||
.stylelintrc.json | ||
README.md | ||
babel.config.js | ||
console-config.js | ||
favicon.png | ||
jest.common.config.js | ||
jest.e2e.config.js | ||
jest.unit.config.js | ||
package-lock.json | ||
package.json | ||
pom.xml | ||
tsconfig.json | ||
tsconfig.test.json | ||
unified-console.html | ||
webpack.config.js |
README.md
Apache Druid web console
This is the Druid web console that servers as a data management interface for Druid.
Developing the console
Getting started
- You need to be within the
web-console
directory - Install the modules with
npm install
- Run
npm run compile
to compile the scss files (this usually needs to be done only once) - Run
npm start
will start in development mode and will proxy druid requests tolocalhost:8888
Note: you can provide an environment variable to proxy to a different Druid host like so: druid_host=1.2.3.4:8888 npm start
Note: you can provide an environment variable use webpack-bundle-analyzer as a plugin in the build script or like so: BUNDLE_ANALYZER_PLUGIN='TRUE' npm start
To try the console in (say) coordinator mode you could run it as such:
druid_host=localhost:8081 npm start
Developing
You should use a TypeScript friendly IDE (such as WebStorm, or VS Code) to develop the web console.
The console relies on eslint (and various plugins), sass-lint, and prettier to enforce code style. If you are going to do any non-trivial development you should set up your IDE to automatically lint and fix your code as you make changes.
Configuring WebStorm
-
Preferences | Languages & Frameworks | JavaScript | Code Quality Tools | ESLint
- Select "Automatic ESLint Configuration"
- Check "Run eslint --fix on save"
-
Preferences | Languages & Frameworks | JavaScript | Prettier
- Set "Run for files" to
{**/*,*}.{js,ts,jsx,tsx,css,scss}
- Check "On code reformat"
- Check "On save"
- Set "Run for files" to
Configuring VS Code
- Install
dbaeumer.vscode-eslint
extension - Install
esbenp.prettier-vscode
extension - Open User Settings (JSON) and set the following:
"editor.defaultFormatter": "esbenp.prettier-vscode", "editor.formatOnSave": true, "editor.codeActionsOnSave": { "source.fixAll.eslint": true }
Auto-fixing manually
It is also possible to auto-fix and format code without making IDE changes by running the following script:
npm run autofix
— run code linters and formatter
You could also run fixers individually:
npm run eslint-fix
— run code linter and fix issuesnpm run sasslint-fix
— run style linter and fix issuesnpm run prettify
— reformat code and styles
Updating the list of license files
If you change the dependencies of the console in any way please run script/licenses
(from the web-console directory).
It will analyze the changes and update the ../licenses
file as needed.
Please be conscious of not introducing dependencies on packages with Apache incompatible licenses.
Running end-to-end tests
From the web-console directory:
- Build druid distribution:
script/druid build
- Start druid cluster:
script/druid start
- Run end-to-end tests:
npm run test-e2e
- Stop druid cluster:
script/druid stop
If you already have a druid cluster running on the standard ports, the steps to build/start/stop a druid cluster can be skipped.
Screenshots for debugging
e2e-tests/util/debug.ts:saveScreenshotIfError()
is used to save a screenshot of the web console
when the test fails. For example, if e2e-tests/tutorial-batch.spec.ts
fails, it will create
load-data-from-local-disk-error-screenshot.png
.
Disabling headless mode
Disabling headless mode while running the tests can be helpful. This can be done via the DRUID_E2E_TEST_HEADLESS
environment variable, which defaults to true
.
Like so: DRUID_E2E_TEST_HEADLESS=false npm run test-e2e
Running against alternate web console
The environment variable DRUID_E2E_TEST_UNIFIED_CONSOLE_PORT
can be used to target a web console running on a
non-default port (i.e., not port 8888
). For example, this environment variable can be used to target the
development mode of the web console (started via npm start
), which runs on port 18081
.
Like so: DRUID_E2E_TEST_UNIFIED_CONSOLE_PORT=18081 npm run test-e2e
Running and debugging a single e2e test using Jest and Playwright
- Run -
jest --config jest.e2e.config.js e2e-tests/tutorial-batch.spec.ts
- Debug -
PWDEBUG=console jest --config jest.e2e.config.js e2e-tests/tutorial-batch.spec.ts
Description of the directory structure
As part of this directory:
assets/
- The images (and other assets) used within the consolee2e-tests/
- End-to-end tests for the consolelib/
- A place where keywords and generated docs live.public/
- The compiled destination for the files powering this consolescript/
- Some helper bash scripts for running this consolesrc/
- This directory (together withlib
) constitutes all the source code for this console
List of non SQL data reading APIs used
GET /status
GET /druid/indexer/v1/supervisor?full
POST /druid/indexer/v1/worker
GET /druid/indexer/v1/workers
GET /druid/indexer/v1/tasks
GET /druid/coordinator/v1/loadqueue?simple
GET /druid/coordinator/v1/config
GET /druid/coordinator/v1/metadata/datasources?includeUnused
GET /druid/coordinator/v1/rules
GET /druid/coordinator/v1/config/compaction
GET /druid/coordinator/v1/tiers