* dynamic coord config adding more balancing control
add new dynamic coordinator config, maxSegmentsToConsiderPerMove. This
config caps the number of segments that are iterated over when selecting
a segment to move. The default value combined with current balancing
strategies will still iterate over all provided segments. However,
setting this value to something > 0 will cap the number of segments
visited. This could make sense in cases where a cluster has a very large
number of segments and the admins prefer less iterations vs a thorough
consideration of all segments provided.
* fix checkstyle failure
* Make doc more detailed for admin to understand when/why to use new config
* refactor PR to use a % of segments instead of raw number
* update the docs
* remove bad doc line
* fix typo in name of new dynamic config
* update RservoirSegmentSampler to gracefully deal with values > 100%
* add handler for <= 0 in ReservoirSegmentSampler
* fixup CoordinatorDynamicConfigTest naming and argument ordering
* fix items in docs after spellcheck flags
* Fix lgtm flag on missing space in string literal
* improve documentation for new config
* Add default value to config docs and add advice in cluster tuning doc
* Add percentOfSegmentsToConsiderPerMove to web console coord config dialog
* update jest snapshot after console change
* fix spell checker errors
* Improve debug logging in getRandomSegmentBalancerHolder to cover all bad inputs for % of segments to consider
* add new config back to web console module after merge with master
* fix ReservoirSegmentSamplerTest
* fix line breaks in coordinator console dialog
* Add a test that helps ensure not regressions for percentOfSegmentsToConsiderPerMove
* Make improvements based off of feedback in review
* additional cleanup coming from review
* Add a warning log if limit on segments to consider for move can't be calcluated
* remove unused import
* fix tests for CoordinatorDynamicConfig
* remove precondition test that is redundant in CoordinatorDynamicConfig Builder class
* no need for intervals
* don't set redundant fields
* fix tests
* better filter control
* work with not
* wrap callout with form group
* update snapshot
* add split hint
* highlight issues with spec
* fixes
* fix default value
* move intervals back to partition step
* work with all sorts of chars
* fix enabled view
* better API escape
* fix escaping issue, bigints
* update licenses
* fix align
* do not show Query with SQL if no SQL
* add prettify script
* update dev readme
* add ordering to the datasource list
* add ordering to supervisor table
* added query error suggestions
* simplify the SQLs
* change segment size display to rows
* suggestion tests
* update snapshot
* make error detection more robust
* remove errant console log
* fix imports
* put suggestion on top
* better error rendering
* format as millions
* add .druid.pid to gitignore
* rename segment_size to segment_rows, fix visability, fix divide by zero
* update snapshots
* Web console: targetRowsPerSegment for hashed partionin
Added `targetRowsPerSegment` to the web console for hashed partition for both
the auto compaction view and as part of the ingestion workflow.
The help text was also updated to indicate when a user should care about
updating these fields
* code review
* update test snapshots
* oops
* Fix Avro OCF detection prefix and run formation detection on raw input
* Support Avro Fixed and Enum types correctly
* Check Avro version byte in format detection
* Add test for AvroOCFReader.sample
Ensures that the Sampler doesn't receive raw input that it can't
serialize into JSON.
* Document Avro type handling
* Add TS unit tests for guessInputFormat
- Update playwright to latest version
- Provide environment variable to disable/enable headless mode
- Allow running E2E tests against any druid cluster running on standard
ports (tutorial-batch.spec.ts now uses an absolute instead of relative
path for the input data)
- Provide environment variable to change target web console port
- Druid setup does not need to download zookeeper
Restore the web console's ability to view a datasource's compaction
configuration via the "action" menu. Refactoring done in
https://github.com/apache/druid/pull/10438 introduced a regression that
always caused the default compaction configuration to be shown via the
"action" menu instead.
Regression test is added in e2e-tests/auto-compaction.spec.ts.
Add an E2E test for the web console workflow of reindexing a Druid
datasource to change the secondary partitioning type. The new test
changes dynamic to single dim partitions since the autocompaction test
already does dynamic to hashed partitions.
Also, run the web console E2E tests in parallel to reduce CI time and
change naming convention for test datasources to make it easier to map
them to the corresponding test run.
Main changes:
1) web-consolee2e-tests/reindexing.spec.ts
- new E2E test
2) web-console/e2e-tests/component/load-data/data-connector/reindex.ts
- new data loader connector for druid input source
3) web-console/e2e-tests/component/load-data/config/partition.ts
- move partition spec definitions from compaction.ts
- add new single dim partition spec definition
* Compaction config UI optional numShards
Specifying `numShards` for hashed partitions is no longer required after
https://github.com/apache/druid/pull/10419. Update the UI to make
`numShards` an optional field for hash partitions.
* Update snapshot
When using the web console to load data by reindexing from Druid, the
`Datasource` and `Interval` inputs are required during the `Connect`
step. Unlike the `Datasource` input, the `Interval` input did not have a
blue outline to indicate that it was required as the `IntervalInput`
component did not support an `intent` property.
Add an E2E test for the common case web console workflow of setting up
autocompaction that changes the partitions from dynamic to hashed.
Also fix an issue with the async test setup to properly wait for the web
console to be ready.
* compaction dialog update
* fix test snapshot
* Update web-console/src/dialogs/compaction-dialog/compaction-dialog.tsx
Co-authored-by: Chi Cao Minh <chi.caominh@imply.io>
* Update web-console/src/dialogs/compaction-dialog/compaction-dialog.tsx
Co-authored-by: Chi Cao Minh <chi.caominh@imply.io>
* feedback changes
Co-authored-by: Chi Cao Minh <chi.caominh@imply.io>
* Fix broken sampler for re-indexer
When re-indexing a Druid datasource, the web-console would generate an
invalid inputFormat since the type is not specified.
* code review
* Change color of Run button for native queries
When a user tries to run a native query, change the color of the button to
Druid's secondary color to indicate that the user is not running a SQL query.
Before this change, the web-console would indicate this by changing the text
of the button from Run (SQL queries) to Rune (native queries). Rune could be
confusing to users as this appears to be a typo.
* Update web-console/src/views/query-view/run-button/run-button.scss
* Update web-console/src/views/query-view/run-button/run-button.scss
* Update web-console/src/views/query-view/run-button/run-button.scss
* code review
* Filter http requests by http method
Add a config that allows a user which http methods to allow against their
Druid server.
Druid will only accept http requests with the method: GET, PUT, POST, DELETE
and OPTIONS.
If a Druid admin wants to allow other methods, they can do so by using the
ServerConfig#allowedHttpMethods config.
If a Druid user would like to disallow OPTIONS, this can be done by changing
the AuthConfig#allowUnauthenticatedHttpOptions config
* Exclude OPTIONS from always supported HTTP methods
Add HEAD as an allowed method for web console e2e tests
* fix docs
* fix security IT
* Actually fix the web console e2e tests
* Ignore icode coverage for nitialization classes
* code review
* Segment timeline doesn't show results older than 3 months
* Adoption testing patch for web segment timeline view and also refactoring default time config
Motivation for this change is to not inadvertently identify binary
formats that contain uncompressed string data as TSV or CSV.
Moving detection of magic byte headers before heuristics should be more
robust in general.
* Add AvroOCFInputFormat
* Support supplying a reader schema in AvroOCFInputFormat
* Add docs for Avro OCF input format
* Address review comments
* Address second round of review
* add flag to flattenSpec to keep null columns
* remove changes to inputFormat interface
* add comment
* change comment message
* update web console e2e test
* move keepNullColmns to JSONParseSpec
* fix merge conflicts
* fix tests
* set keepNullColumns to false by default
* fix lgtm
* change Boolean to boolean, add keepNullColumns to hash, add tests for keepKeepNullColumns false + true with no nuulul columns
* Add equals verifier tests
web-console/e2e-tests/tutorial-batch.spec.ts would occasionally timeout
between the transition from the data loader "configure schema" and
"partition" steps due to missing waits when toggling the rollup setting.
Also, fix shellcheck warnings for script/druid.
Load data and query (i.e., automate
https://druid.apache.org/docs/latest/tutorials/tutorial-batch.html) to
have some basic checks ensuring the web console is wired up to druid
correctly.
The new end-to-end tests (tutorial-batch.spec.ts) are added to
`web-console/e2e-tests`. Within that directory:
- `components` represent the various tabs of the web console. Currently,
abstractions for `load data`, `ingestion`, `datasources`, and `query`
are implemented.
- `components/load-data/data-connector` contains abstractions for the
different data source options available to the data loader's `Connect`
step. Currently, only the `Local file` data source connector is
implemented.
- `components/load-data/config` contains abstractions for the different
configuration options available for each step of the data loader flow.
Currently, the `Configure Schema`, `Partition`, and `Publish` steps
have initial implementation of their configuration options.
- `util` contains various helper methods for the tests and does not
contain abstractions of the web console.
Changes to add the new tests to CI:
- `.travis.yml`: New "web console end-to-end tests" job
- `web-console/jest.*.js`: Refactor jest configurations to have
different flavors for unit tests and for end-to-end tests. In
particular, the latter adds a jest setup configuration to wait for the
web console to be ready (`web-console/e2e-tests/util/setup.ts`).
- `web-console/package.json`: Refactor run scripts to add new script for
running end-to-end tests.
- `web-console/script/druid`: Utility scripts for building, starting,
and stopping druid.
Other changes:
- `pom.xml`: Refactor various settings disable java static checks and to
disable java tests into two new maven profiles. Since the same
settings are used in several places (e.g., .travis.yml, Dockerfiles,
etc.), having them in maven profiles makes it more maintainable.
- `web-console/src/console-application.tsx`: Fix typo ("the the").
* expose props for S3
* added env inputs
* add scarry warning
* use .password
* put the warning front and center
* Update web-console/src/views/load-data-view/load-data-view.tsx
Co-Authored-By: Suneet Saldanha <44787917+suneet-s@users.noreply.github.com>
* let prettier rewrap the text
Co-authored-by: Suneet Saldanha <44787917+suneet-s@users.noreply.github.com>
* add support for new version of DQT
* update druid-query-toolkit
* fix direction css
* fix remove
* update package
* remove useless conditional
* bump package
* jest -u
Co-authored-by: Maggie Brewster <maggiebrewster@implydata20sMBP.attlocal.net>
* Reconcile terminology and method naming to 'used/unused segments'; Don't use terms 'enable/disable data source'; Rename MetadataSegmentManager to MetadataSegments; Make REST API methods which mark segments as used/unused to return server error instead of an empty response in case of error
* Fix brace
* Import order
* Rename withKillDataSourceWhitelist to withSpecificDataSourcesToKill
* Fix tests
* Fix tests by adding proper methods without interval parameters to IndexerMetadataStorageCoordinator instead of hacking with Intervals.ETERNITY
* More aligned names of DruidCoordinatorHelpers, rename several CoordinatorDynamicConfig parameters
* Rename ClientCompactTaskQuery to ClientCompactionTaskQuery for consistency with CompactionTask; ClientCompactQueryTuningConfig to ClientCompactionTaskQueryTuningConfig
* More variable and method renames
* Rename MetadataSegments to SegmentsMetadata
* Javadoc update
* Simplify SegmentsMetadata.getUnusedSegmentIntervals(), more javadocs
* Update Javadoc of VersionedIntervalTimeline.iterateAllObjects()
* Reorder imports
* Rename SegmentsMetadata.tryMark... methods to mark... and make them to return boolean and the numbers of segments changed and relay exceptions to callers
* Complete merge
* Add CollectionUtils.newTreeSet(); Refactor DruidCoordinatorRuntimeParams creation in tests
* Remove MetadataSegmentManager
* Rename millisLagSinceCoordinatorBecomesLeaderBeforeCanMarkAsUnusedOvershadowedSegments to leadingTimeMillisBeforeCanMarkAsUnusedOvershadowedSegments
* Fix tests, refactor DruidCluster creation in tests into DruidClusterBuilder
* Fix inspections
* Fix SQLMetadataSegmentManagerEmptyTest and rename it to SqlSegmentsMetadataEmptyTest
* Rename SegmentsAndMetadata to SegmentsAndCommitMetadata to reduce the similarity with SegmentsMetadata; Rename some methods
* Rename DruidCoordinatorHelper to CoordinatorDuty, refactor DruidCoordinator
* Unused import
* Optimize imports
* Rename IndexerSQLMetadataStorageCoordinator.getDataSourceMetadata() to retrieveDataSourceMetadata()
* Unused import
* Update terminology in datasource-view.tsx
* Fix label in datasource-view.spec.tsx.snap
* Fix lint errors in datasource-view.tsx
* Doc improvements
* Another attempt to please TSLint
* Another attempt to please TSLint
* Style fixes
* Fix IndexerSQLMetadataStorageCoordinator.createUsedSegmentsSqlQueryForIntervals() (wrong merge)
* Try to fix docs build issue
* Javadoc and spelling fixes
* Rename SegmentsMetadata to SegmentsMetadataManager, address other comments
* Address more comments