* Claim full support for Java 17.
No production code has changed, except the startup scripts.
Changes:
1) Allow Java 17 without DRUID_SKIP_JAVA_CHECK.
2) Include the full list of opens and exports on both Java 11 and 17.
3) Document that Java 17 is both supported and preferred.
4) Switch some tests from Java 11 to 17 to get better coverage on the
preferred version.
* Doc update.
* Update errorprone.
* Update docker_build_containers.sh.
* Update errorprone in licenses.yaml.
* Add some more run-javas.
* Additional run-javas.
* Update errorprone.
* Suppress new errorprone error.
* Add exports and opens in ForkingTaskRunner for Java 11+.
Test, doc changes.
* Additional errorprone updates.
* Update for errorprone.
* Restore old fomatting in LdapCredentialsValidator.
* Copy bin/ too.
* Fix Java 15, 17 build line in docker_build_containers.sh.
* Update busybox image.
* One more java command.
* Fix interpolation.
* IT commandline refinements.
* Switch to busybox 1.34.1-glibc.
* POM adjustments, build and test one IT on 17.
* Additional debugging.
* Fix silly thing.
* Adjust command line.
* Add exports and opens one more place.
* Additional harmonization of strong encapsulation parameters.
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Victoria Lim <lim.t.victoria@gmail.com>
* Make LoggingEmitter more useful
* Skip code coverage for facade classes
* fix spellcheck
* code review
* fix dependency
* logging.md
* fix checkstyle
* Add back jacoco version to main pom
* Make the tasks run with only a single directory
There was a change that tried to get indexing to run on multiple disks
It made a bunch of changes to how tasks run, effectively hiding the
"safe" directory for tasks to write files into from the task code itself
making it extremely difficult to do anything correctly inside of a task.
This change reverts those changes inside of the tasks and makes it so that
only the task runners are the ones that make decisions about which
mount points should be used for storing task-related files.
It adds the config druid.worker.baseTaskDirs which can be used by the
task runners to know which directories they should schedule tasks inside of.
The TaskConfig remains the authoritative source of configuration for where
and how an individual task should be operating.
Broken test appears unrelated to this PR
* make druidapi pip installable
* include druidapi in prerequisites
* add license to setup.py
* updates from Paul's review
* note about editable install
* Apply suggestions from code review
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
* update install instructions
* found unrelated typos
* standardize install cmd with pip
---------
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Python Druid API for use in notebooks
Revises existing notebooks and readme to reference
the new API.
Notebook to explain the new API.
Split README into a console version and a notebook
version to work around lack of a nice display for
md files.
Update the REST API notebook to use simpler Requests calls
Converted the SQL tutorial to use the Python library
README file, converted to using properties
---------
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
* docs: add juptyer API tutorial for API and jupyter tutorial index (#3)
(cherry picked from commit aeb8d9e3390fa26d9c533dce0862295b80c58583)
* update prereqs and fix jupyterlab name
* Removing notebook since 13345 has it
13345 should be merged first
* update contributing instructions
* docs: link to the Druid SQL tutorial
* Add link to partitioning
* fix merge conflict
* Saving
* Update docs/tutorials/tutorial-jupyter-index.md
* Remove partitioning
---------
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Co-authored-by: brian.le <brian.le@imply.io>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
This commit updates the task memory allocation logic.
- min task count is 2 and max task count is number of cpus on the machine
- task count increases wrt total task memory
- task memory increases from 512m to 2g
This PR expands `StringDimensionIndexer` to handle conversion of `byte[]` to base64 encoded strings, rather than the current behavior of calling java `toString`.
This issue was uncovered by a regression of sorts introduced by #13519, which updated the protobuf extension to directly convert stuff to java types, resulting in `bytes` typed values being converted as `byte[]` instead of a base64 string which the previous JSON based conversion created. While outputting `byte[]` is more consistent with other input formats, and preferable when the bytes can be consumed directly (such as complex types serde), when fed to a `StringDimensionIndexer`, it resulted in an ugly java `toString` because `processRowValsToUnsortedEncodedKeyComponent` is fed the output of `row.getRaw(..)`. Converting `byte[]` to a base64 string within `StringDimensionIndexer` is consistent with the behavior of calling `row.getDimension(..)` which does do this coercion (and why many tests on binary types appeared to be doing the expected thing).
I added some protobuf `bytes` tests, but they don't really hit the new `StringDimensionIndexer` behavior because they operate on the `InputRow` directly, and call `getDimension` to validate stuff. The parser based version still uses the old conversion mechanisms, so when not using a flattener incorrectly calls `toString` on the `ByteString`. I have encoded this behavior in the test for now, if we either update the parser to use the new flattener or just .. remove parsers we can remove this test stuff.
* docs: notebook for API tutorial
* Apply suggestions from code review
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* address the other comments
* typo
* add commentary to outputs
* address feedback from will
* delete unnecessary comment
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Changes:
* Use 80% of memory specified for running services (versus 50% earlier).
* Tasks get either 512m / 1024m or 2048m now (versus 512m or 2048m earlier).
* Add direct memory for router.
* Druid automated quickstart
* remove conf/druid/single-server/quickstart/_common/historical/jvm.config
* Minor changes in python script
* Add lower bound memory for some services
* Additional runtime properties for services
* Update supervise script to accept command arguments, corresponding changes in druid-quickstart.py
* File end newline
* Limit the ability to start multiple instances of a service, documentation changes
* simplify script arguments
* restore changes in medium profile
* run-druid refactor
* compute and pass middle manager runtime properties to run-druid
supervise script changes to process java opts array
use argparse, leave free memory, logging
* Remove extra quotes from mm task javaopts array
* Update logic to compute minimum memory
* simplify run-druid
* remove debug options from run-druid
* resolve the config_path provided
* comment out service specific runtime properties which are computed in the code
* simplify run-druid
* clean up docs, naming changes
* Throw ValueError exception on illegal state
* update docs
* rename args, compute_only -> compute, run_zk -> zk
* update help documentation
* update help documentation
* move task memory computation into separate method
* Add validation checks
* remove print
* Add validations
* remove start-druid bash script, rename start-druid-main
* Include tasks in lower bound memory calculation
* Fix test
* 256m instead of 256g
* caffeine cache uses 5% of heap
* ensure min task count is 2, task count is monotonic
* update configs and documentation for runtime props in conf/druid/single-server/quickstart
* Update docs
* Specify memory argument for each profile in single-server.md
* Update middleManager runtime.properties
* Move quickstart configs to conf/druid/base, add bash launch script, support python2
* Update supervise script
* rename base config directory to auto
* rename python script, changes to pass repeated args to supervise
* remove exmaples/conf/druid/base dir
* add docs
* restore changes in conf dir
* update start-druid-auto
* remove hashref for commands in supervise script
* start-druid-main java_opts array is comma separated
* update entry point script name in python script
* Update help docs
* documentation changes
* docs changes
* update docs
* add support for running indexer
* update supported services list
* update help
* Update python.md
* remove dir
* update .spelling
* Remove dependency on psutil and pathlib
* update docs
* Update get_physical_memory method
* Update help docs
* update docs
* update method to get physical memory on python
* udpate spelling
* update .spelling
* minor change
* Minor change
* memory comptuation for indexer
* update start-druid
* Update python.md
* Update single-server.md
* Update python.md
* run python3 --version to check if python is installed
* Update supervise script
* start-druid: echo message if python not found
* update anchor text
* minor change
* Update condition in supervise script
* JVM not jvm in docs
Eliminates two common sources of noise with Kafka supervisors that have
large numbers of tasks and partitions:
1) Log the report at DEBUG rather than INFO level at each run cycle.
It can get quite large, and can be retrieved via API when needed.
2) Use log4j2.xml to quiet down the org.apache.kafka.clients.consumer.internals
package. Avoids a log message per-partition per-minute as part of seeking
to the latest offset in the reporting thread. In the tasks, where this
sort of logging might be more useful, we have another log message with
the same information: "Seeking partition[%s] to[%s]".
Druid currently uses Zookeeper dependent options as the default.
This commit updates the following to use HTTP as the default instead.
- task runner. `druid.indexer.runner.type=remote -> httpRemote`
- load queue peon. `druid.coordinator.loadqueuepeon.type=curator -> http`
- server inventory view. `druid.serverview.type=curator -> http`