druid

Commit Graph

Author	SHA1	Message	Date
AmatyaAvadhanula	c41e99e10c	Do not allocate week granular segments unless requested (#15589 ) * Do not allocate week granular segments unless explicitly requested	2024-01-05 12:14:52 +05:30
Alexander T	90af71b371	router.sh is missing in Druid Distribution (#15547 ) Every time we roll out a new version of Druid on our cluster, I recognize that the script for starting the router process is missing. So I added it =)	2023-12-15 10:42:04 -08:00
Vadim Ogievetsky	1df53d6eb3	fix physical memory detection on OSX (#15405 ) * fix physical memory detection on OSX * typo	2023-11-20 23:19:11 -08:00
nasuiyile	9333dd1f73	Correct the path of ipynb file of notebook introduction. (#15327 )	2023-11-07 11:01:06 +08:00
Sergio Ferragut	c9c3df204e	Redirect to new jupyter notebook project (#15136 )	2023-11-01 08:38:40 -07:00
Xavier Léauté	352702bb25	run some integration tests with Java 21 (#15104 ) * use setup-java everywhere for consistency * add Java 21 to integration test matrix * simplify docker build containers script + add Java 21 * fix for Java versions reporting 21-ea	2023-10-20 11:18:13 +08:00
Peter Marshall	0dfd99e381	202307-notebook-unionall (#14726 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-08-21 10:55:58 -07:00
Peter Marshall	f585f0a8ed	202306-docs-notebook topn (#14478 ) Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-08-16 14:50:49 -07:00
Jill Osborne	2561477e87	Jupyter nested columns tutorial (#14788 )	2023-08-16 14:45:37 -07:00
Peter Marshall	e33d2db235	202307-notebooks Template amends (#14683 ) Co-authored-by: writer-jill <jill.osborne@imply.io>	2023-08-15 11:25:56 -07:00
Sergio Ferragut	353f7bed7f	Adding data generation pod to jupyter notebooks deployment (#14742 ) Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-08-10 15:43:05 -07:00
Tejaswini Bandlamudi	a45b25fa1d	Removes support for Hadoop 2 (#14763 ) Removing Hadoop 2 support as discussed in https://lists.apache.org/list?dev@druid.apache.org:lte=1M:hadoop	2023-08-09 17:47:52 +05:30
Suneet Saldanha	00f1f8cef5	Enable ServiceStatusMonitor in the examples (#14744 )	2023-08-03 06:07:01 -07:00
Will Xu	25df122b41	Releasenote notebooks 26 (#14410 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Victoria Lim <lim.t.victoria@gmail.com>	2023-07-28 16:43:35 -07:00
Sergio Ferragut	6b229f5118	new environment vars for external druid and kafka + jupyter template (#14592 )	2023-07-25 11:02:16 -07:00
Nhi Pham	a764ed7fde	Update Jupyter notebook tutorial instructions for ARM devices (#14459 ) Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-07-11 10:01:20 -07:00
Gian Merlino	63ee69b4e8	Claim full support for Java 17. (#14384 ) * Claim full support for Java 17. No production code has changed, except the startup scripts. Changes: 1) Allow Java 17 without DRUID_SKIP_JAVA_CHECK. 2) Include the full list of opens and exports on both Java 11 and 17. 3) Document that Java 17 is both supported and preferred. 4) Switch some tests from Java 11 to 17 to get better coverage on the preferred version. * Doc update. * Update errorprone. * Update docker_build_containers.sh. * Update errorprone in licenses.yaml. * Add some more run-javas. * Additional run-javas. * Update errorprone. * Suppress new errorprone error. * Add exports and opens in ForkingTaskRunner for Java 11+. Test, doc changes. * Additional errorprone updates. * Update for errorprone. * Restore old fomatting in LdapCredentialsValidator. * Copy bin/ too. * Fix Java 15, 17 build line in docker_build_containers.sh. * Update busybox image. * One more java command. * Fix interpolation. * IT commandline refinements. * Switch to busybox 1.34.1-glibc. * POM adjustments, build and test one IT on 17. * Additional debugging. * Fix silly thing. * Adjust command line. * Add exports and opens one more place. * Additional harmonization of strong encapsulation parameters.	2023-07-07 12:52:35 -07:00
Nhi Pham	4ee7b14f5f	update links in jupyter notebook (#14404 )	2023-07-03 13:50:25 -07:00
Peter Marshall	b6d6e3b827	Update start-druid-main.py (#14471 ) Quick typo correction.	2023-06-23 14:07:24 +05:30
Sergio Ferragut	1a9aefbb0f	Move from Jupyter notebook to Jupyter Lab and introduce a notebook folder structure (#14419 )	2023-06-21 09:11:00 -07:00
Gian Merlino	2b676ac7f8	Quieter KafkaSupervisors in all bundled log4j2.xml. (#14444 ) Follow-up to #13392, which added this to a single log4j2.xml.	2023-06-19 12:04:11 +05:30
Charles Smith	37cb76d545	fixes dataSourceName varaible ref (#14340 )	2023-05-30 13:15:27 -07:00
Charles Smith	88831b1dd0	Docs: Updates docker compose to turn off kraft which causes errors (#14335 )	2023-05-24 09:33:32 -07:00
Katya Macedo	269137c682	Update Ingestion section (#14023 ) Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Victoria Lim <lim.t.victoria@gmail.com>	2023-05-19 09:42:27 -07:00
Victoria Lim	058eb99a8b	Docs: Update Docker profile and fix method call in `druidapi` tutorial (#14308 )	2023-05-18 07:29:02 -07:00
Abhishek Radhakrishnan	7400ed3c93	Fixup data deletion tutorial docs (#14283 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-05-17 17:05:35 -07:00
Charles Smith	c84c174caa	update tutorials to use clarify druid host location for Docker Compose + Druid version (#14295 )	2023-05-17 15:41:02 -07:00
Victoria Lim	66d4ea014c	Docs: Tutorial for streaming ingestion using Kafka + Docker file to use with Jupyter tutorials (#13984 )	2023-05-15 15:20:52 -07:00
Suneet Saldanha	84c11df980	Make LoggingEmitter more useful by using Markers (#14121 ) * Make LoggingEmitter more useful * Skip code coverage for facade classes * fix spellcheck * code review * fix dependency * logging.md * fix checkstyle * Add back jacoco version to main pom	2023-04-27 15:06:06 -07:00
imply-cheddar	aaa6cc1883	Make the tasks run with only a single directory (#14063 ) * Make the tasks run with only a single directory There was a change that tried to get indexing to run on multiple disks It made a bunch of changes to how tasks run, effectively hiding the "safe" directory for tasks to write files into from the task code itself making it extremely difficult to do anything correctly inside of a task. This change reverts those changes inside of the tasks and makes it so that only the task runners are the ones that make decisions about which mount points should be used for storing task-related files. It adds the config druid.worker.baseTaskDirs which can be used by the task runners to know which directories they should schedule tasks inside of. The TaskConfig remains the authoritative source of configuration for where and how an individual task should be operating.	2023-04-13 00:45:02 -07:00
Abhishek Radhakrishnan	5ce1b0903e	Add basic security functions to druidapi (follow up to #14009 ) (#14055 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Paul Rogers <progers@apache.org>	2023-04-11 10:55:27 -07:00
Victoria Lim	ede9903ff4	pip install for Python Druid API (#13938 ) Broken test appears unrelated to this PR * make druidapi pip installable * include druidapi in prerequisites * add license to setup.py * updates from Paul's review * note about editable install * Apply suggestions from code review Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * update install instructions * found unrelated typos * standardize install cmd with pip --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-03-21 11:37:39 -07:00
Gian Merlino	f0fb094cc7	Fix start-druid for indexers. (#13891 ) There was an unused parameter causing the unpack to fail.	2023-03-08 10:32:07 -08:00
Paul Rogers	a580aca551	Python Druid API for use in notebooks (#13787 ) Python Druid API for use in notebooks Revises existing notebooks and readme to reference the new API. Notebook to explain the new API. Split README into a console version and a notebook version to work around lack of a nice display for md files. Update the REST API notebook to use simpler Requests calls Converted the SQL tutorial to use the Python library README file, converted to using properties --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-03-04 18:25:19 -08:00
Katya Macedo	1595653e6f	docs: add a link for the Druid SQL tutorial (#13468 ) * docs: add juptyer API tutorial for API and jupyter tutorial index (#3) (cherry picked from commit aeb8d9e3390fa26d9c533dce0862295b80c58583) * update prereqs and fix jupyterlab name * Removing notebook since 13345 has it 13345 should be merged first * update contributing instructions * docs: link to the Druid SQL tutorial * Add link to partitioning * fix merge conflict * Saving * Update docs/tutorials/tutorial-jupyter-index.md * Remove partitioning --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: brian.le <brian.le@imply.io> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-02-22 09:36:13 -08:00
Katya Macedo	58d9720b00	docs: notebook only for SQL tutorial (#13465 ) CI Failures seem unrelated to docs * docs: notebook only for SQL tutorial * Update logical operators section * Fix typo * Adopt review suggestions * Update examples/quickstart/jupyter-notebooks/sql-tutorial.ipynb * Update examples, add link to keywords * Update after review * Update per review comments * Add links	2023-02-08 20:04:53 -08:00
AmatyaAvadhanula	0cf1fc3d55	Indexing on multiple disks (#13476 ) * Initial commit * Simple UTs * Parameterize tests * Parameterized tests for k8s task runner * Fix restore bug * Refactor TaskStorageDirTracker * Change CliPeon args	2023-02-08 11:31:34 +05:30
Rishabh Singh	a83d1cdf26	fix var name (#13657 )	2023-01-11 21:15:30 +05:30
317brian	d9c27d6102	docs: add index page and related stuff for jupyter tutorials (#13342 )	2022-12-16 13:33:50 -08:00
Rishabh Singh	f42722e627	Set monotonically increasing worker capacity in start-druid-main (#13581 ) This commit updates the task memory allocation logic. - min task count is 2 and max task count is number of cpus on the machine - task count increases wrt total task memory - task memory increases from 512m to 2g	2022-12-16 15:34:30 +05:30
Clint Wylie	d9e5245ff0	allow string dimension indexer to handle byte[] as base64 strings (#13573 ) This PR expands `StringDimensionIndexer` to handle conversion of `byte[]` to base64 encoded strings, rather than the current behavior of calling java `toString`. This issue was uncovered by a regression of sorts introduced by #13519, which updated the protobuf extension to directly convert stuff to java types, resulting in `bytes` typed values being converted as `byte[]` instead of a base64 string which the previous JSON based conversion created. While outputting `byte[]` is more consistent with other input formats, and preferable when the bytes can be consumed directly (such as complex types serde), when fed to a `StringDimensionIndexer`, it resulted in an ugly java `toString` because `processRowValsToUnsortedEncodedKeyComponent` is fed the output of `row.getRaw(..)`. Converting `byte[]` to a base64 string within `StringDimensionIndexer` is consistent with the behavior of calling `row.getDimension(..)` which does do this coercion (and why many tests on binary types appeared to be doing the expected thing). I added some protobuf `bytes` tests, but they don't really hit the new `StringDimensionIndexer` behavior because they operate on the `InputRow` directly, and call `getDimension` to validate stuff. The parser based version still uses the old conversion mechanisms, so when not using a flattener incorrectly calls `toString` on the `ByteString`. I have encoded this behavior in the test for now, if we either update the parser to use the new flattener or just .. remove parsers we can remove this test stuff.	2022-12-16 14:50:17 +05:30
317brian	668d1fad6b	docs: notebook only for API tutorial (#13345 ) * docs: notebook for API tutorial * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> * address the other comments * typo * add commentary to outputs * address feedback from will * delete unnecessary comment Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-12-15 13:16:07 -08:00
Rishabh Singh	97bc0220c7	Update task memory computation in start-druid (#13563 ) Changes: * Use 80% of memory specified for running services (versus 50% earlier). * Tasks get either 512m / 1024m or 2048m now (versus 512m or 2048m earlier). * Add direct memory for router.	2022-12-15 11:06:16 +05:30
Vadim Ogievetsky	2729e25295	Link to java docs (#13478 ) * add link to page about selecting a JRE * add link to script also * simplify text	2022-12-14 11:45:23 -08:00
Rishabh Singh	8e386072e9	Druid automated quickstart: zookeeper in service list (#13550 )	2022-12-12 10:29:43 -08:00
Rishabh Singh	4ebdfe226d	Druid automated quickstart (#13365 ) * Druid automated quickstart * remove conf/druid/single-server/quickstart/_common/historical/jvm.config * Minor changes in python script * Add lower bound memory for some services * Additional runtime properties for services * Update supervise script to accept command arguments, corresponding changes in druid-quickstart.py * File end newline * Limit the ability to start multiple instances of a service, documentation changes * simplify script arguments * restore changes in medium profile * run-druid refactor * compute and pass middle manager runtime properties to run-druid supervise script changes to process java opts array use argparse, leave free memory, logging * Remove extra quotes from mm task javaopts array * Update logic to compute minimum memory * simplify run-druid * remove debug options from run-druid * resolve the config_path provided * comment out service specific runtime properties which are computed in the code * simplify run-druid * clean up docs, naming changes * Throw ValueError exception on illegal state * update docs * rename args, compute_only -> compute, run_zk -> zk * update help documentation * update help documentation * move task memory computation into separate method * Add validation checks * remove print * Add validations * remove start-druid bash script, rename start-druid-main * Include tasks in lower bound memory calculation * Fix test * 256m instead of 256g * caffeine cache uses 5% of heap * ensure min task count is 2, task count is monotonic * update configs and documentation for runtime props in conf/druid/single-server/quickstart * Update docs * Specify memory argument for each profile in single-server.md * Update middleManager runtime.properties * Move quickstart configs to conf/druid/base, add bash launch script, support python2 * Update supervise script * rename base config directory to auto * rename python script, changes to pass repeated args to supervise * remove exmaples/conf/druid/base dir * add docs * restore changes in conf dir * update start-druid-auto * remove hashref for commands in supervise script * start-druid-main java_opts array is comma separated * update entry point script name in python script * Update help docs * documentation changes * docs changes * update docs * add support for running indexer * update supported services list * update help * Update python.md * remove dir * update .spelling * Remove dependency on psutil and pathlib * update docs * Update get_physical_memory method * Update help docs * update docs * update method to get physical memory on python * udpate spelling * update .spelling * minor change * Minor change * memory comptuation for indexer * update start-druid * Update python.md * Update single-server.md * Update python.md * run python3 --version to check if python is installed * Update supervise script * start-druid: echo message if python not found * update anchor text * minor change * Update condition in supervise script * JVM not jvm in docs	2022-12-09 11:04:02 -08:00
Gian Merlino	c61313f4c4	Quieter streaming supervisors. (#13392 ) Eliminates two common sources of noise with Kafka supervisors that have large numbers of tasks and partitions: 1) Log the report at DEBUG rather than INFO level at each run cycle. It can get quite large, and can be retrieved via API when needed. 2) Use log4j2.xml to quiet down the org.apache.kafka.clients.consumer.internals package. Avoids a log message per-partition per-minute as part of seeking to the latest offset in the reporting thread. In the tasks, where this sort of logging might be more useful, we have another log message with the same information: "Seeking partition[%s] to[%s]".	2022-11-20 23:53:17 -08:00
AmatyaAvadhanula	41e51b21c3	Make http options the default configurations (#13092 ) Druid currently uses Zookeeper dependent options as the default. This commit updates the following to use HTTP as the default instead. - task runner. `druid.indexer.runner.type=remote -> httpRemote` - load queue peon. `druid.coordinator.loadqueuepeon.type=curator -> http` - server inventory view. `druid.serverview.type=curator -> http`	2022-10-05 05:35:17 +05:30
Frank Chen	eff7c64228	export com.sun.management.internal (#13068 )	2022-09-12 09:03:22 -07:00
Vadim Ogievetsky	2a039e7e6a	Add CTA and fix typo (#13009 ) * Add CTA and fix typo * resolve hostname better	2022-09-06 11:16:50 -07:00

1 2 3 4 5 ...

1237 Commits