* Update examples/bin/dsql scripts to accept Python 3
Remove redundant urllib import
Translating to Python3: Changing xrange to range
Translating to Python3: Changing long to int
Translating to Python3: Change urllib2 methods, and fix encoding/decoding issues
Remove unnecessary import
Add option for Python2
Rename files
* Update examples/bin/dsql
Co-authored-by: Benedict Jin <asdf2014@apache.org>
* Resolve PR comments
Add comment in files indicating updates need to be made in both places
Update examples/bin/dsql
Co-authored-by: Benedict Jin <asdf2014@apache.org>
* Update error output when using Python 2.
Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>
---------
Co-authored-by: Benedict Jin <asdf2014@apache.org>
Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>
* Claim full support for Java 17.
No production code has changed, except the startup scripts.
Changes:
1) Allow Java 17 without DRUID_SKIP_JAVA_CHECK.
2) Include the full list of opens and exports on both Java 11 and 17.
3) Document that Java 17 is both supported and preferred.
4) Switch some tests from Java 11 to 17 to get better coverage on the
preferred version.
* Doc update.
* Update errorprone.
* Update docker_build_containers.sh.
* Update errorprone in licenses.yaml.
* Add some more run-javas.
* Additional run-javas.
* Update errorprone.
* Suppress new errorprone error.
* Add exports and opens in ForkingTaskRunner for Java 11+.
Test, doc changes.
* Additional errorprone updates.
* Update for errorprone.
* Restore old fomatting in LdapCredentialsValidator.
* Copy bin/ too.
* Fix Java 15, 17 build line in docker_build_containers.sh.
* Update busybox image.
* One more java command.
* Fix interpolation.
* IT commandline refinements.
* Switch to busybox 1.34.1-glibc.
* POM adjustments, build and test one IT on 17.
* Additional debugging.
* Fix silly thing.
* Adjust command line.
* Add exports and opens one more place.
* Additional harmonization of strong encapsulation parameters.
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Victoria Lim <lim.t.victoria@gmail.com>
* Make LoggingEmitter more useful
* Skip code coverage for facade classes
* fix spellcheck
* code review
* fix dependency
* logging.md
* fix checkstyle
* Add back jacoco version to main pom
* Make the tasks run with only a single directory
There was a change that tried to get indexing to run on multiple disks
It made a bunch of changes to how tasks run, effectively hiding the
"safe" directory for tasks to write files into from the task code itself
making it extremely difficult to do anything correctly inside of a task.
This change reverts those changes inside of the tasks and makes it so that
only the task runners are the ones that make decisions about which
mount points should be used for storing task-related files.
It adds the config druid.worker.baseTaskDirs which can be used by the
task runners to know which directories they should schedule tasks inside of.
The TaskConfig remains the authoritative source of configuration for where
and how an individual task should be operating.
Broken test appears unrelated to this PR
* make druidapi pip installable
* include druidapi in prerequisites
* add license to setup.py
* updates from Paul's review
* note about editable install
* Apply suggestions from code review
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
* update install instructions
* found unrelated typos
* standardize install cmd with pip
---------
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Python Druid API for use in notebooks
Revises existing notebooks and readme to reference
the new API.
Notebook to explain the new API.
Split README into a console version and a notebook
version to work around lack of a nice display for
md files.
Update the REST API notebook to use simpler Requests calls
Converted the SQL tutorial to use the Python library
README file, converted to using properties
---------
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
* docs: add juptyer API tutorial for API and jupyter tutorial index (#3)
(cherry picked from commit aeb8d9e3390fa26d9c533dce0862295b80c58583)
* update prereqs and fix jupyterlab name
* Removing notebook since 13345 has it
13345 should be merged first
* update contributing instructions
* docs: link to the Druid SQL tutorial
* Add link to partitioning
* fix merge conflict
* Saving
* Update docs/tutorials/tutorial-jupyter-index.md
* Remove partitioning
---------
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Co-authored-by: brian.le <brian.le@imply.io>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
This commit updates the task memory allocation logic.
- min task count is 2 and max task count is number of cpus on the machine
- task count increases wrt total task memory
- task memory increases from 512m to 2g
This PR expands `StringDimensionIndexer` to handle conversion of `byte[]` to base64 encoded strings, rather than the current behavior of calling java `toString`.
This issue was uncovered by a regression of sorts introduced by #13519, which updated the protobuf extension to directly convert stuff to java types, resulting in `bytes` typed values being converted as `byte[]` instead of a base64 string which the previous JSON based conversion created. While outputting `byte[]` is more consistent with other input formats, and preferable when the bytes can be consumed directly (such as complex types serde), when fed to a `StringDimensionIndexer`, it resulted in an ugly java `toString` because `processRowValsToUnsortedEncodedKeyComponent` is fed the output of `row.getRaw(..)`. Converting `byte[]` to a base64 string within `StringDimensionIndexer` is consistent with the behavior of calling `row.getDimension(..)` which does do this coercion (and why many tests on binary types appeared to be doing the expected thing).
I added some protobuf `bytes` tests, but they don't really hit the new `StringDimensionIndexer` behavior because they operate on the `InputRow` directly, and call `getDimension` to validate stuff. The parser based version still uses the old conversion mechanisms, so when not using a flattener incorrectly calls `toString` on the `ByteString`. I have encoded this behavior in the test for now, if we either update the parser to use the new flattener or just .. remove parsers we can remove this test stuff.
* docs: notebook for API tutorial
* Apply suggestions from code review
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* address the other comments
* typo
* add commentary to outputs
* address feedback from will
* delete unnecessary comment
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Changes:
* Use 80% of memory specified for running services (versus 50% earlier).
* Tasks get either 512m / 1024m or 2048m now (versus 512m or 2048m earlier).
* Add direct memory for router.