* Druid automated quickstart
* remove conf/druid/single-server/quickstart/_common/historical/jvm.config
* Minor changes in python script
* Add lower bound memory for some services
* Additional runtime properties for services
* Update supervise script to accept command arguments, corresponding changes in druid-quickstart.py
* File end newline
* Limit the ability to start multiple instances of a service, documentation changes
* simplify script arguments
* restore changes in medium profile
* run-druid refactor
* compute and pass middle manager runtime properties to run-druid
supervise script changes to process java opts array
use argparse, leave free memory, logging
* Remove extra quotes from mm task javaopts array
* Update logic to compute minimum memory
* simplify run-druid
* remove debug options from run-druid
* resolve the config_path provided
* comment out service specific runtime properties which are computed in the code
* simplify run-druid
* clean up docs, naming changes
* Throw ValueError exception on illegal state
* update docs
* rename args, compute_only -> compute, run_zk -> zk
* update help documentation
* update help documentation
* move task memory computation into separate method
* Add validation checks
* remove print
* Add validations
* remove start-druid bash script, rename start-druid-main
* Include tasks in lower bound memory calculation
* Fix test
* 256m instead of 256g
* caffeine cache uses 5% of heap
* ensure min task count is 2, task count is monotonic
* update configs and documentation for runtime props in conf/druid/single-server/quickstart
* Update docs
* Specify memory argument for each profile in single-server.md
* Update middleManager runtime.properties
* Move quickstart configs to conf/druid/base, add bash launch script, support python2
* Update supervise script
* rename base config directory to auto
* rename python script, changes to pass repeated args to supervise
* remove exmaples/conf/druid/base dir
* add docs
* restore changes in conf dir
* update start-druid-auto
* remove hashref for commands in supervise script
* start-druid-main java_opts array is comma separated
* update entry point script name in python script
* Update help docs
* documentation changes
* docs changes
* update docs
* add support for running indexer
* update supported services list
* update help
* Update python.md
* remove dir
* update .spelling
* Remove dependency on psutil and pathlib
* update docs
* Update get_physical_memory method
* Update help docs
* update docs
* update method to get physical memory on python
* udpate spelling
* update .spelling
* minor change
* Minor change
* memory comptuation for indexer
* update start-druid
* Update python.md
* Update single-server.md
* Update python.md
* run python3 --version to check if python is installed
* Update supervise script
* start-druid: echo message if python not found
* update anchor text
* minor change
* Update condition in supervise script
* JVM not jvm in docs
* add padding and keywords
* add arrayOfDoubles
* Update docs/development/extensions-core/datasketches-tuple.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/development/extensions-core/datasketches-tuple.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/development/extensions-core/datasketches-tuple.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/development/extensions-core/datasketches-tuple.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/development/extensions-core/datasketches-tuple.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* partiton int
* fix docs
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
The batch segment sampling performs significantly better than the older method
of sampling if there are a large number of used segments. It also avoids duplicates.
Changes:
- Make batch segment sampling the default
- Deprecate the property `useBatchedSegmentSampler`
- Remove unused coordinator config `druid.coordinator.loadqueuepeon.repeatDelay`
- Cleanup `KillUnusedSegments`
- Simplify `KillUnusedSegmentsTest`, add better tests, remove redundant tests
Segment assignments can take very long due to the strategy cost computation
for a large number of segments. This commit allows segment assignments to be
done in a round-robin fashion within a tier. Only segment balancing takes cost-based
decisions to move segments around.
Changes
- Add dynamic config `useRoundRobinSegmentAssignment` with default value false
- Add `RoundRobinServerSelector`. This does not implement the `BalancerStrategy`
as it does not conform to that contract and may also be used in conjunction with a
strategy (round-robin for `RunRules` and a cost strategy for `BalanceSegments`)
- Drops are still cost-based even when round-robin assignment is enabled.
* remove old query view
* update tests
* add filter
* fix test
* bump d3 things to latest versions
* rent too far into the future with d3
* make config dialogs load
* goodies
* update snapshots
* only compute duration when running or pending
* MSQ web console
* fix typo in comments
* remove useless conditional
* wrap SQL_DATA_TYPES
* fixes sus regex
* rewrite regex
* remove problematic regex
* fix UTs
* convert PARTITIONED / CLUSTERED BY to ORDER BY for preview
* fix log
* updated to use shuffle
* Web console: Use Ace.Completion directly (#1405)
* Use Ace.Completion directly
* Another Ace.Completion
* better comment
* fix column ordering in e2e test
* add nested data example also
Co-authored-by: John Gozde <john.gozde@imply.io>
Kinesis ingestion requires all shards to have at least 1 record at the required position in druid.
Even if this is satisified initially, resharding the stream can lead to empty intermediate shards. A significant delay in writing to newly created shards was also problematic.
Kinesis shard sequence numbers are big integers. Introduce two more custom sequence tokens UNREAD_TRIM_HORIZON and UNREAD_LATEST to indicate that a shard has not been read from and that it needs to be read from the start or the end respectively.
These values can be used to avoid the need to read at least one record to obtain a sequence number for ingesting a newly discovered shard.
If a record cannot be obtained immediately, use a marker to obtain the relevant shardIterator and use this shardIterator to obtain a valid sequence number. As long as a valid sequence number is not obtained, continue storing the token as the offset.
These tokens (UNREAD_TRIM_HORIZON and UNREAD_LATEST) are logically ordered to be earlier than any valid sequence number.
However, the ordering requires a few subtle changes to the existing mechanism for record sequence validation:
The sequence availability check ensures that the current offset is before the earliest available sequence in the shard. However, current token being an UNREAD token indicates that any sequence number in the shard is valid (despite the ordering)
Kinesis sequence numbers are inclusive i.e if current sequence == end sequence, there are more records left to read.
However, the equality check is exclusive when dealing with UNREAD tokens.
* Improved Java 17 support and Java runtime docs.
1) Add a "Java runtime" doc page with information about supported
Java versions, garbage collection, and strong encapsulation..
2) Update asm and equalsverifier to versions that support Java 17.
3) Add additional "--add-opens" lines to surefire configuration, so
tests can pass successfully under Java 17.
4) Switch openjdk15 tests to openjdk17.
5) Update FrameFile to specifically mention Java runtime incompatibility
as the cause of not being able to use Memory.map.
6) Update SegmentLoadDropHandler to log an error for Errors too, not
just Exceptions. This is important because an IllegalAccessError is
encountered when the correct "--add-opens" line is not provided,
which would otherwise be silently ignored.
7) Update example configs to use druid.indexer.runner.javaOptsArray
instead of druid.indexer.runner.javaOpts. (The latter is deprecated.)
* Adjustments.
* Use run-java in more places.
* Add run-java.
* Update .gitignore.
* Exclude hadoop-client-api.
Brought in when building on Java 17.
* Swap one more usage of java.
* Fix the run-java script.
* Fix flag.
* Include link to Temurin.
* Spelling.
* Update examples/bin/run-java
Co-authored-by: Xavier Léauté <xl+github@xvrl.net>
Co-authored-by: Xavier Léauté <xl+github@xvrl.net>
* Update blueprint dependencies & LICENSES
* Switch to bp4 namespace; use bp-ns variable in overrides
* Add webpack alias for colors.scss
* Snapshots
* Update selectors in e2e tests
The current default value of inputSegmentSizeBytes is 400MB, which is pretty
low for most compaction use cases. Thus most users are forced to override the
default.
The default value is now increased to Long.MAX_VALUE.
There aren't any changes in this patch that improve Java 11
compatibility; these changes have already been done separately. This
patch merely updates documentation and explicit Java version checks.
The log message adjustments in DruidProcessingConfig are there to make
things a little nicer when running in Java 11, where we can't measure
direct memory _directly_, and so we may auto-size processing buffers
incorrectly.
* refactor and link fixes
* add sql docs to left nav
* code format for needle
* updated web console script
* link fixes
* update earliest/latest functions
* edits for grammar and style
* more link fixes
* another link
* update with #12226
* update .spelling file
* clean up the balancing code around the batched vs deprecated way of sampling segments to balance
* fix docs, clarify comments, add deprecated annotations to legacy code
* remove unused variable
* update dynamic config dialog in console to state percentOfSegmentsToConsiderPerMove deprecated
* fix dynamic config text for percentOfSegmentsToConsiderPerMove
* run prettier to cleanup coordinator-dynamic-config.tsx changes
* update jest snapshot
* update documentation per review feedback
* add-info-for-compation-config-dialog
* correct the info
* remove space typo
* Revert "remove space typo"
This reverts commit 28b28733ae.
* remove typo space
* update snapshots for jest-test
* Support real query cancelling for web console
* use uuid for queryId, create isSql reuse variable, and add catch for rejectionhandled promise
* remove delete api promise.then() response
* slove conflicts
* update read me with debug
* add degub code to test why CI failed
* included a druid extension called druid-testing-tools and it is not build nor loaded by default
* remove unuse variable
* remove debug log
* add typeIs
* fix unused field in count metric
* better types
* typos
* work with readonly types
* factor out apply cancel buttons
* form editor
* selection type
* unsaved changes
* form editor spec
* tidy up sampler
* more menu controls
* update e2e test
* fix quotes
* fix sql doc parsing
* prevent array-input from losing position while the user is typing
* make group filter click-to-filterable
* fix casing bug in exact table search
* do not sort columns in smaples
* can bypass transform step
* fixed string json parsing
* improve PartitionMessage
* better error messages
* feedback fixes
* tool to order dimensions in schema view
* allow encoding of ascii control chars
* change to JSON
* make json escpaes work
* update snapshot
* break out component
* fix test
* update test script
* update formatter to be more chill
* tidy up
* add to segments view
* add unit tests for date
* better util export
* fix ds view
* fix tests
* fix test in time
* unset untermediate state
* Use common browserlist and update to drop IE11
* Change TypeScript target to ES2016
* Update browserslist for "supports es6" support
* Show a warning if accessed from an unsupported browser
* Inline browser-update styles; detect SyntaxErrors too
* Better wording
* Upgrade to the latest Blueprint
* Refactor RunButton to be FC, use useHotkeys
* Remove dead license
* Update snapshots
* Address feedback
* Wording
Co-authored-by: Vadim Ogievetsky <vadimon@gmail.com>
Co-authored-by: Vadim Ogievetsky <vadimon@gmail.com>
* lay the groundwork for throttling replicant loads per RunRules execution
* Add dynamic coordinator config to control new replicant threshold.
* remove redundant line
* add some unit tests
* fix checkstyle error
* add documentation for new dynamic config
* improve docs and logs
* Alter how null is handled for new config. If null, manually set as default
* Update some dev dependencies, prettify, tslint-fix
* Sort tsconfig keys for easy comparison
* Set noImplicitThis
* Slightly more accurate types
* Bump Jest and related
* Bump react to latest on v16
* Bump node-sass, sass-loader for node14 support
* Remove node-sass-chokidar (unused)
* More unused dependencies
* Fix blueprint imports
* Webpack 5
* Update webpack config for 'process' usage
* Update playwright-chromium
* Emit esnext modules for tree shaking
* Enable source maps in development
* Dedupe
* Bump babel and things
* npm audit fix
* Add .editorconfig file to match prettier settings
* Update licenses (tslib is 0BSD as of 1.11.2)
https://github.com/microsoft/tslib/pull/96
* Require node >= 10
* Use Node 10 to run e2e tests
* Use 'ws' transport mode for dev server (will be default in next version)
* Remove an 'any'
* No sourcemaps in prod
* Exclude .editorconfig from license checks
* Try nvm for setting node version
* DruidInputSource: Fix issues in column projection, timestamp handling.
DruidInputSource, DruidSegmentReader changes:
1) Remove "dimensions" and "metrics". They are not necessary, because we
can compute which columns we need to read based on what is going to
be used by the timestamp, transform, dimensions, and metrics.
2) Start using ColumnsFilter (see below) to decide which columns we need
to read.
3) Actually respect the "timestampSpec". Previously, it was ignored, and
the timestamp of the returned InputRows was set to the `__time` column
of the input datasource.
(1) and (2) together fix a bug in which the DruidInputSource would not
properly read columns that are used as inputs to a transformSpec.
(3) fixes a bug where the timestampSpec would be ignored if you attempted
to set the column to something other than `__time`.
(1) and (3) are breaking changes.
Web console changes:
1) Remove "Dimensions" and "Metrics" from the Druid input source.
2) Set timestampSpec to `{"column": "__time", "format": "millis"}` for
compatibility with the new behavior.
Other changes:
1) Add ColumnsFilter, a new class that allows input readers to determine
which columns they need to read. Currently, it's only used by the
DruidInputSource, but it could be used by other columnar input sources
in the future.
2) Add a ColumnsFilter to InputRowSchema.
3) Remove the metric names from InputRowSchema (they were unused).
4) Add InputRowSchemas.fromDataSchema method that computes the proper
ColumnsFilter for given timestamp, dimensions, transform, and metrics.
5) Add "getRequiredColumns" method to TransformSpec to support the above.
* Various fixups.
* Uncomment incorrectly commented lines.
* Move TransformSpecTest to the proper module.
* Add druid.indexer.task.ignoreTimestampSpecForDruidInputSource setting.
* Fix.
* Fix build.
* Checkstyle.
* Misc fixes.
* Fix test.
* Move config.
* Fix imports.
* Fixup.
* Fix ShuffleResourceTest.
* Add import.
* Smarter exclusions.
* Fixes based on tests.
Also, add TIME_COLUMN constant in the web console.
* Adjustments for tests.
* Reorder test data.
* Update docs.
* Update docs to say Druid 0.22.0 instead of 0.21.0.
* Fix test.
* Fix ITAutoCompactionTest.
* Changes from review & from merging.
* dynamic coord config adding more balancing control
add new dynamic coordinator config, maxSegmentsToConsiderPerMove. This
config caps the number of segments that are iterated over when selecting
a segment to move. The default value combined with current balancing
strategies will still iterate over all provided segments. However,
setting this value to something > 0 will cap the number of segments
visited. This could make sense in cases where a cluster has a very large
number of segments and the admins prefer less iterations vs a thorough
consideration of all segments provided.
* fix checkstyle failure
* Make doc more detailed for admin to understand when/why to use new config
* refactor PR to use a % of segments instead of raw number
* update the docs
* remove bad doc line
* fix typo in name of new dynamic config
* update RservoirSegmentSampler to gracefully deal with values > 100%
* add handler for <= 0 in ReservoirSegmentSampler
* fixup CoordinatorDynamicConfigTest naming and argument ordering
* fix items in docs after spellcheck flags
* Fix lgtm flag on missing space in string literal
* improve documentation for new config
* Add default value to config docs and add advice in cluster tuning doc
* Add percentOfSegmentsToConsiderPerMove to web console coord config dialog
* update jest snapshot after console change
* fix spell checker errors
* Improve debug logging in getRandomSegmentBalancerHolder to cover all bad inputs for % of segments to consider
* add new config back to web console module after merge with master
* fix ReservoirSegmentSamplerTest
* fix line breaks in coordinator console dialog
* Add a test that helps ensure not regressions for percentOfSegmentsToConsiderPerMove
* Make improvements based off of feedback in review
* additional cleanup coming from review
* Add a warning log if limit on segments to consider for move can't be calcluated
* remove unused import
* fix tests for CoordinatorDynamicConfig
* remove precondition test that is redundant in CoordinatorDynamicConfig Builder class
* no need for intervals
* don't set redundant fields
* fix tests
* better filter control
* work with not
* wrap callout with form group
* update snapshot
* add split hint
* highlight issues with spec
* fixes
* fix default value
* move intervals back to partition step
* work with all sorts of chars
* fix enabled view
* better API escape
* fix escaping issue, bigints
* update licenses
* fix align
* do not show Query with SQL if no SQL
* add prettify script
* update dev readme
* add ordering to the datasource list
* add ordering to supervisor table
* added query error suggestions
* simplify the SQLs
* change segment size display to rows
* suggestion tests
* update snapshot
* make error detection more robust
* remove errant console log
* fix imports
* put suggestion on top
* better error rendering
* format as millions
* add .druid.pid to gitignore
* rename segment_size to segment_rows, fix visability, fix divide by zero
* update snapshots
* Web console: targetRowsPerSegment for hashed partionin
Added `targetRowsPerSegment` to the web console for hashed partition for both
the auto compaction view and as part of the ingestion workflow.
The help text was also updated to indicate when a user should care about
updating these fields
* code review
* update test snapshots
* oops
* Fix Avro OCF detection prefix and run formation detection on raw input
* Support Avro Fixed and Enum types correctly
* Check Avro version byte in format detection
* Add test for AvroOCFReader.sample
Ensures that the Sampler doesn't receive raw input that it can't
serialize into JSON.
* Document Avro type handling
* Add TS unit tests for guessInputFormat
- Update playwright to latest version
- Provide environment variable to disable/enable headless mode
- Allow running E2E tests against any druid cluster running on standard
ports (tutorial-batch.spec.ts now uses an absolute instead of relative
path for the input data)
- Provide environment variable to change target web console port
- Druid setup does not need to download zookeeper
Restore the web console's ability to view a datasource's compaction
configuration via the "action" menu. Refactoring done in
https://github.com/apache/druid/pull/10438 introduced a regression that
always caused the default compaction configuration to be shown via the
"action" menu instead.
Regression test is added in e2e-tests/auto-compaction.spec.ts.
Add an E2E test for the web console workflow of reindexing a Druid
datasource to change the secondary partitioning type. The new test
changes dynamic to single dim partitions since the autocompaction test
already does dynamic to hashed partitions.
Also, run the web console E2E tests in parallel to reduce CI time and
change naming convention for test datasources to make it easier to map
them to the corresponding test run.
Main changes:
1) web-consolee2e-tests/reindexing.spec.ts
- new E2E test
2) web-console/e2e-tests/component/load-data/data-connector/reindex.ts
- new data loader connector for druid input source
3) web-console/e2e-tests/component/load-data/config/partition.ts
- move partition spec definitions from compaction.ts
- add new single dim partition spec definition
* Compaction config UI optional numShards
Specifying `numShards` for hashed partitions is no longer required after
https://github.com/apache/druid/pull/10419. Update the UI to make
`numShards` an optional field for hash partitions.
* Update snapshot
When using the web console to load data by reindexing from Druid, the
`Datasource` and `Interval` inputs are required during the `Connect`
step. Unlike the `Datasource` input, the `Interval` input did not have a
blue outline to indicate that it was required as the `IntervalInput`
component did not support an `intent` property.
Add an E2E test for the common case web console workflow of setting up
autocompaction that changes the partitions from dynamic to hashed.
Also fix an issue with the async test setup to properly wait for the web
console to be ready.
* compaction dialog update
* fix test snapshot
* Update web-console/src/dialogs/compaction-dialog/compaction-dialog.tsx
Co-authored-by: Chi Cao Minh <chi.caominh@imply.io>
* Update web-console/src/dialogs/compaction-dialog/compaction-dialog.tsx
Co-authored-by: Chi Cao Minh <chi.caominh@imply.io>
* feedback changes
Co-authored-by: Chi Cao Minh <chi.caominh@imply.io>
* Fix broken sampler for re-indexer
When re-indexing a Druid datasource, the web-console would generate an
invalid inputFormat since the type is not specified.
* code review
* Change color of Run button for native queries
When a user tries to run a native query, change the color of the button to
Druid's secondary color to indicate that the user is not running a SQL query.
Before this change, the web-console would indicate this by changing the text
of the button from Run (SQL queries) to Rune (native queries). Rune could be
confusing to users as this appears to be a typo.
* Update web-console/src/views/query-view/run-button/run-button.scss
* Update web-console/src/views/query-view/run-button/run-button.scss
* Update web-console/src/views/query-view/run-button/run-button.scss
* code review
* Filter http requests by http method
Add a config that allows a user which http methods to allow against their
Druid server.
Druid will only accept http requests with the method: GET, PUT, POST, DELETE
and OPTIONS.
If a Druid admin wants to allow other methods, they can do so by using the
ServerConfig#allowedHttpMethods config.
If a Druid user would like to disallow OPTIONS, this can be done by changing
the AuthConfig#allowUnauthenticatedHttpOptions config
* Exclude OPTIONS from always supported HTTP methods
Add HEAD as an allowed method for web console e2e tests
* fix docs
* fix security IT
* Actually fix the web console e2e tests
* Ignore icode coverage for nitialization classes
* code review
* Segment timeline doesn't show results older than 3 months
* Adoption testing patch for web segment timeline view and also refactoring default time config
Motivation for this change is to not inadvertently identify binary
formats that contain uncompressed string data as TSV or CSV.
Moving detection of magic byte headers before heuristics should be more
robust in general.
* Add AvroOCFInputFormat
* Support supplying a reader schema in AvroOCFInputFormat
* Add docs for Avro OCF input format
* Address review comments
* Address second round of review
* add flag to flattenSpec to keep null columns
* remove changes to inputFormat interface
* add comment
* change comment message
* update web console e2e test
* move keepNullColmns to JSONParseSpec
* fix merge conflicts
* fix tests
* set keepNullColumns to false by default
* fix lgtm
* change Boolean to boolean, add keepNullColumns to hash, add tests for keepKeepNullColumns false + true with no nuulul columns
* Add equals verifier tests
web-console/e2e-tests/tutorial-batch.spec.ts would occasionally timeout
between the transition from the data loader "configure schema" and
"partition" steps due to missing waits when toggling the rollup setting.
Also, fix shellcheck warnings for script/druid.