Commit Graph

1243 Commits

Author SHA1 Message Date
Virushade a4c971c373
Fix Multiple GC declared error when running Druid Cluster (#17078)
* Remove UseG1GC in all jvm.config to prevent multiple GC declared error
2024-09-20 11:06:13 +02:00
Rishabh Singh 39161b0b23
Use vault.centos.org to build Hadoop docker image (#16999)
The Dockerfile for building hadoop image is broken due to Centos 7 EOL.
Fixed it as per https://serverfault.com/a/1161847.
2024-09-05 10:36:55 +05:30
Virushade f290cf083a
Update examples/bin/dsql scripts to accept Python 3 (#16677)
* Update examples/bin/dsql scripts to accept Python 3

Remove redundant urllib import

Translating to Python3: Changing xrange to range

Translating to Python3: Changing long to int

Translating to Python3: Change urllib2 methods, and fix encoding/decoding issues

Remove unnecessary import

Add option for Python2

Rename files

* Update examples/bin/dsql

Co-authored-by: Benedict Jin <asdf2014@apache.org>

* Resolve PR comments

Add comment in files indicating updates need to be made in both places

Update examples/bin/dsql

Co-authored-by: Benedict Jin <asdf2014@apache.org>

* Update error output when using Python 2.

Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>

---------

Co-authored-by: Benedict Jin <asdf2014@apache.org>
Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>
2024-07-03 15:52:57 +08:00
Rishabh Singh f51c7b346f
Add druid parquet extensions to example quickstarts (#16664)
This change adds druid-parquet-extensions to all example quickstarts
2024-06-27 14:41:58 +05:30
Charles Smith 2a42b11660
remove legacy Jupyter tutorial files (#15834)
* remove legacy files

* redirection for the jupyter tutorial page

* remove tutorial from sidebar

* remove redirection
2024-02-12 13:45:47 -08:00
Abhishek Agarwal 0ab2781a7f
Disable eager initialization for non-query connection requests (#15751) 2024-01-25 14:38:50 +05:30
AmatyaAvadhanula c41e99e10c
Do not allocate week granular segments unless requested (#15589)
* Do not allocate week granular segments unless explicitly requested
2024-01-05 12:14:52 +05:30
Alexander T 90af71b371
router.sh is missing in Druid Distribution (#15547)
Every time we roll out a new version of Druid on our cluster, I recognize that the script for starting the router process is missing. So I added it =)
2023-12-15 10:42:04 -08:00
Vadim Ogievetsky 1df53d6eb3
fix physical memory detection on OSX (#15405)
* fix physical memory detection on OSX

* typo
2023-11-20 23:19:11 -08:00
nasuiyile 9333dd1f73
Correct the path of ipynb file of notebook introduction. (#15327) 2023-11-07 11:01:06 +08:00
Sergio Ferragut c9c3df204e
Redirect to new jupyter notebook project (#15136) 2023-11-01 08:38:40 -07:00
Xavier Léauté 352702bb25
run some integration tests with Java 21 (#15104)
* use setup-java everywhere for consistency

* add Java 21 to integration test matrix

* simplify docker build containers script + add Java 21

* fix for Java versions reporting 21-ea
2023-10-20 11:18:13 +08:00
Peter Marshall 0dfd99e381
202307-notebook-unionall (#14726)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-08-21 10:55:58 -07:00
Peter Marshall f585f0a8ed
202306-docs-notebook topn (#14478)
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-08-16 14:50:49 -07:00
Jill Osborne 2561477e87
Jupyter nested columns tutorial (#14788) 2023-08-16 14:45:37 -07:00
Peter Marshall e33d2db235
202307-notebooks Template amends (#14683)
Co-authored-by: writer-jill <jill.osborne@imply.io>
2023-08-15 11:25:56 -07:00
Sergio Ferragut 353f7bed7f
Adding data generation pod to jupyter notebooks deployment (#14742)
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2023-08-10 15:43:05 -07:00
Tejaswini Bandlamudi a45b25fa1d
Removes support for Hadoop 2 (#14763)
Removing Hadoop 2 support as discussed in https://lists.apache.org/list?dev@druid.apache.org:lte=1M:hadoop
2023-08-09 17:47:52 +05:30
Suneet Saldanha 00f1f8cef5
Enable ServiceStatusMonitor in the examples (#14744) 2023-08-03 06:07:01 -07:00
Will Xu 25df122b41
Releasenote notebooks 26 (#14410)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Victoria Lim <lim.t.victoria@gmail.com>
2023-07-28 16:43:35 -07:00
Sergio Ferragut 6b229f5118
new environment vars for external druid and kafka + jupyter template (#14592) 2023-07-25 11:02:16 -07:00
Nhi Pham a764ed7fde
Update Jupyter notebook tutorial instructions for ARM devices (#14459)
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-07-11 10:01:20 -07:00
Gian Merlino 63ee69b4e8
Claim full support for Java 17. (#14384)
* Claim full support for Java 17.

No production code has changed, except the startup scripts.

Changes:

1) Allow Java 17 without DRUID_SKIP_JAVA_CHECK.

2) Include the full list of opens and exports on both Java 11 and 17.

3) Document that Java 17 is both supported and preferred.

4) Switch some tests from Java 11 to 17 to get better coverage on the
   preferred version.

* Doc update.

* Update errorprone.

* Update docker_build_containers.sh.

* Update errorprone in licenses.yaml.

* Add some more run-javas.

* Additional run-javas.

* Update errorprone.

* Suppress new errorprone error.

* Add exports and opens in ForkingTaskRunner for Java 11+.

Test, doc changes.

* Additional errorprone updates.

* Update for errorprone.

* Restore old fomatting in LdapCredentialsValidator.

* Copy bin/ too.

* Fix Java 15, 17 build line in docker_build_containers.sh.

* Update busybox image.

* One more java command.

* Fix interpolation.

* IT commandline refinements.

* Switch to busybox 1.34.1-glibc.

* POM adjustments, build and test one IT on 17.

* Additional debugging.

* Fix silly thing.

* Adjust command line.

* Add exports and opens one more place.

* Additional harmonization of strong encapsulation parameters.
2023-07-07 12:52:35 -07:00
Nhi Pham 4ee7b14f5f
update links in jupyter notebook (#14404) 2023-07-03 13:50:25 -07:00
Peter Marshall b6d6e3b827
Update start-druid-main.py (#14471)
Quick typo correction.
2023-06-23 14:07:24 +05:30
Sergio Ferragut 1a9aefbb0f
Move from Jupyter notebook to Jupyter Lab and introduce a notebook folder structure (#14419) 2023-06-21 09:11:00 -07:00
Gian Merlino 2b676ac7f8
Quieter KafkaSupervisors in all bundled log4j2.xml. (#14444)
Follow-up to #13392, which added this to a single log4j2.xml.
2023-06-19 12:04:11 +05:30
Charles Smith 37cb76d545
fixes dataSourceName varaible ref (#14340) 2023-05-30 13:15:27 -07:00
Charles Smith 88831b1dd0
Docs: Updates docker compose to turn off kraft which causes errors (#14335) 2023-05-24 09:33:32 -07:00
Katya Macedo 269137c682
Update Ingestion section (#14023)
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Victoria Lim <lim.t.victoria@gmail.com>
2023-05-19 09:42:27 -07:00
Victoria Lim 058eb99a8b
Docs: Update Docker profile and fix method call in `druidapi` tutorial (#14308) 2023-05-18 07:29:02 -07:00
Abhishek Radhakrishnan 7400ed3c93
Fixup data deletion tutorial docs (#14283)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2023-05-17 17:05:35 -07:00
Charles Smith c84c174caa
update tutorials to use clarify druid host location for Docker Compose + Druid version (#14295) 2023-05-17 15:41:02 -07:00
Victoria Lim 66d4ea014c
Docs: Tutorial for streaming ingestion using Kafka + Docker file to use with Jupyter tutorials (#13984) 2023-05-15 15:20:52 -07:00
Suneet Saldanha 84c11df980
Make LoggingEmitter more useful by using Markers (#14121)
* Make LoggingEmitter more useful

* Skip code coverage for facade classes

* fix spellcheck

* code review

* fix dependency

* logging.md

* fix checkstyle

* Add back jacoco version to main pom
2023-04-27 15:06:06 -07:00
imply-cheddar aaa6cc1883
Make the tasks run with only a single directory (#14063)
* Make the tasks run with only a single directory

There was a change that tried to get indexing to run on multiple disks
It made a bunch of changes to how tasks run, effectively hiding the
"safe" directory for tasks to write files into from the task code itself
making it extremely difficult to do anything correctly inside of a task.

This change reverts those changes inside of the tasks and makes it so that
only the task runners are the ones that make decisions about which
mount points should be used for storing task-related files.

It adds the config druid.worker.baseTaskDirs which can be used by the
task runners to know which directories they should schedule tasks inside of.
The TaskConfig remains the authoritative source of configuration for where
and how an individual task should be operating.
2023-04-13 00:45:02 -07:00
Abhishek Radhakrishnan 5ce1b0903e
Add basic security functions to druidapi (follow up to #14009) (#14055)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Paul Rogers <progers@apache.org>
2023-04-11 10:55:27 -07:00
Victoria Lim ede9903ff4
pip install for Python Druid API (#13938)
Broken test appears unrelated to this PR

* make druidapi pip installable

* include druidapi in prerequisites

* add license to setup.py

* updates from Paul's review

* note about editable install

* Apply suggestions from code review

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>

* update install instructions

* found unrelated typos

* standardize install cmd with pip

---------

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2023-03-21 11:37:39 -07:00
Gian Merlino f0fb094cc7
Fix start-druid for indexers. (#13891)
There was an unused parameter causing the unpack to fail.
2023-03-08 10:32:07 -08:00
Paul Rogers a580aca551
Python Druid API for use in notebooks (#13787)
Python Druid API for use in notebooks

Revises existing notebooks and readme to reference
the new API.

Notebook to explain the new API.

Split README into a console version and a notebook
version to work around lack of a nice display for
md files.

Update the REST API notebook to use simpler Requests calls

Converted the SQL tutorial to use the Python library

README file, converted to using properties

---------

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2023-03-04 18:25:19 -08:00
Katya Macedo 1595653e6f
docs: add a link for the Druid SQL tutorial (#13468)
* docs: add juptyer API tutorial for API and jupyter tutorial index (#3)

(cherry picked from commit aeb8d9e3390fa26d9c533dce0862295b80c58583)

* update prereqs and fix jupyterlab name

* Removing notebook since 13345 has it

13345 should be merged first

* update contributing instructions

* docs: link to the  Druid SQL tutorial

* Add link to partitioning

* fix merge conflict

* Saving

* Update docs/tutorials/tutorial-jupyter-index.md

* Remove partitioning

---------

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Co-authored-by: brian.le <brian.le@imply.io>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-02-22 09:36:13 -08:00
Katya Macedo 58d9720b00
docs: notebook only for SQL tutorial (#13465)
CI Failures seem unrelated to docs

* docs: notebook only for SQL tutorial

* Update logical operators section

* Fix typo

* Adopt review suggestions

* Update examples/quickstart/jupyter-notebooks/sql-tutorial.ipynb

* Update examples, add link to keywords

* Update after review

* Update per review comments

* Add links
2023-02-08 20:04:53 -08:00
AmatyaAvadhanula 0cf1fc3d55
Indexing on multiple disks (#13476)
* Initial commit

* Simple UTs

* Parameterize tests

* Parameterized tests for k8s task runner

* Fix restore bug

* Refactor TaskStorageDirTracker

* Change CliPeon args
2023-02-08 11:31:34 +05:30
Rishabh Singh a83d1cdf26
fix var name (#13657) 2023-01-11 21:15:30 +05:30
317brian d9c27d6102
docs: add index page and related stuff for jupyter tutorials (#13342) 2022-12-16 13:33:50 -08:00
Rishabh Singh f42722e627
Set monotonically increasing worker capacity in start-druid-main (#13581)
This commit updates the task memory allocation logic.
- min task count is 2 and max task count is number of cpus on the machine
- task count increases wrt total task memory
- task memory increases from 512m to 2g
2022-12-16 15:34:30 +05:30
Clint Wylie d9e5245ff0
allow string dimension indexer to handle byte[] as base64 strings (#13573)
This PR expands `StringDimensionIndexer` to handle conversion of `byte[]` to base64 encoded strings, rather than the current behavior of calling java `toString`. 

This issue was uncovered by a regression of sorts introduced by #13519, which updated the protobuf extension to directly convert stuff to java types, resulting in `bytes` typed values being converted as `byte[]` instead of a base64 string which the previous JSON based conversion created. While outputting `byte[]` is more consistent with other input formats, and preferable when the bytes can be consumed directly (such as complex types serde), when fed to a `StringDimensionIndexer`, it resulted in an ugly java `toString` because `processRowValsToUnsortedEncodedKeyComponent` is fed the output of `row.getRaw(..)`. Converting `byte[]` to a base64 string within `StringDimensionIndexer` is consistent with the behavior of calling `row.getDimension(..)` which does do this coercion (and why many tests on binary types appeared to be doing the expected thing).

I added some protobuf `bytes` tests, but they don't really hit the new `StringDimensionIndexer` behavior because they operate on the `InputRow` directly, and call `getDimension` to validate stuff. The parser based version still uses the old conversion mechanisms, so when not using a flattener incorrectly calls `toString` on the `ByteString`. I have encoded this behavior in the test for now, if we either update the parser to use the new flattener or just .. remove parsers we can remove this test stuff.
2022-12-16 14:50:17 +05:30
317brian 668d1fad6b
docs: notebook only for API tutorial (#13345)
* docs: notebook for API tutorial

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* address the other comments

* typo

* add commentary to outputs

* address feedback from will

* delete unnecessary comment

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2022-12-15 13:16:07 -08:00
Rishabh Singh 97bc0220c7
Update task memory computation in start-druid (#13563)
Changes:
* Use 80% of memory specified for running services (versus 50% earlier).
* Tasks get either 512m / 1024m or 2048m now (versus 512m or 2048m earlier). 
* Add direct memory for router.
2022-12-15 11:06:16 +05:30
Vadim Ogievetsky 2729e25295
Link to java docs (#13478)
* add link to page about selecting a JRE

* add link to script also

* simplify text
2022-12-14 11:45:23 -08:00