druid

Commit Graph

Author	SHA1	Message	Date
Gian Merlino	ef6811ef88	Improved Java 17 support and Java runtime docs. (#12839 ) * Improved Java 17 support and Java runtime docs. 1) Add a "Java runtime" doc page with information about supported Java versions, garbage collection, and strong encapsulation.. 2) Update asm and equalsverifier to versions that support Java 17. 3) Add additional "--add-opens" lines to surefire configuration, so tests can pass successfully under Java 17. 4) Switch openjdk15 tests to openjdk17. 5) Update FrameFile to specifically mention Java runtime incompatibility as the cause of not being able to use Memory.map. 6) Update SegmentLoadDropHandler to log an error for Errors too, not just Exceptions. This is important because an IllegalAccessError is encountered when the correct "--add-opens" line is not provided, which would otherwise be silently ignored. 7) Update example configs to use druid.indexer.runner.javaOptsArray instead of druid.indexer.runner.javaOpts. (The latter is deprecated.) * Adjustments. * Use run-java in more places. * Add run-java. * Update .gitignore. * Exclude hadoop-client-api. Brought in when building on Java 17. * Swap one more usage of java. * Fix the run-java script. * Fix flag. * Include link to Temurin. * Spelling. * Update examples/bin/run-java Co-authored-by: Xavier Léauté <xl+github@xvrl.net> Co-authored-by: Xavier Léauté <xl+github@xvrl.net>	2022-08-03 23:16:05 -07:00
Gian Merlino	0ca37c20a6	Python 3 support for post-index-task. (#12841 ) * Python 3 support for post-index-task. Useful when running on macOS or any other system that doesn't have Python 2. * Encode JSON returned by read_task_file. * Adjust. * Skip needless loads. * Add a decode. * Additional decodes needed.	2022-08-02 17:53:34 -07:00
Charles Smith	efbb58e90e	docs: remove maxRowsPerSegment where appropriate (#12071 ) * remove maxRowsPerSegment where appropriate * fix tutorial, accept suggestions * Update docs/design/coordinator.md * additional tutorial file * fix initial index spec * accept comments * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * add back comment on maxrows per segment * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * rm duplicate entry * Update native-batch-simple-task.md remove ref to `maxrowspersegment` * Update native-batch.md remove ref to `maxrowspersegment` * final tenticles * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-07-28 16:52:13 +05:30
Gian Merlino	0099940808	Add TIME_IN_INTERVAL SQL operator. (#12662 ) * Add TIME_IN_INTERVAL SQL operator. The operator is implemented as a convertlet rather than an OperatorConversion, because this allows it to be equivalent to using the >= and < operators directly. * SqlParserPos cannot be null here. * Remove unused import. * Doc updates. * Add words to dictionary.	2022-06-21 13:05:37 -07:00
Gian Merlino	a27f4f5740	Service stdout log files, move logs to log/. (#12570 ) * Service stdout log files, move logs to log/. Two changes that make log behavior cleaner: 1) Redirect messages from the Java runtime to their own log files. Otherwise, they would get jumbled up in the output of the all-in-one start command. 2) Use log/ instead of bin/log/ for the default log directory. Makes them easier to find. Additionally, add documentation about how to avoid the reflective access warnings in Java 11. * Spelling. * See if code formatting affects spelling.	2022-06-03 10:44:29 +05:30
Tiffany Yeh	665c926824	Fix zulu8 set-up Dockerfile for hadoop and hadoop3 in hadoop ingestion tutorial (#12248 ) Fix errors related to zulu8 installation for building the Hadoop Docker image in the Load From Apache Hadoop tutorial. The steps to download zulu8 in the Dockerfile and setup-zulu-repo.sh were replaced with the steps in the Dockerfile released by zulu-openjdk: `be45d20302/centos/8u282-8.52.0.23/Dockerfile`.	2022-04-11 20:28:09 +05:30
AmatyaAvadhanula	7bf1d8c5c0	Facilitate lazy initialization of connections to mitigate overwhelming of Coordinator (#12298 ) Add config for eager / lazy connection initialization in ResourcePool Description Currently, when multiple tasks are launched, each of them eagerly initializes a full pool's worth of connections to the coordinator. While this is acceptable when the parameter for number of eagerConnections (== maxSize) is small, this can be problematic in environments where it's a large value (say 1000) and multiple tasks are launched simultaneously, which can cause a large number of connections to be created to the coordinator, thereby overwhelming it. Patch Nodes like the broker may require eager initialization of resources and do not create connections with the Coordinator. It is unnecessary to do this with other types of nodes. A config parameter eagerInitialization is added, which when set to true, initializes the max permissible connections when ResourcePool is initialized. If set to false, lazy initialization of connection resources takes place. NOTE: All nodes except the broker have this new parameter set to false in the quickstart as part of this PR Algorithm The current implementation relies on the creation of maxSize resources eagerly. The new implementation's behaviour is as follows: If a resource has been previously created and is available, lend it. Else if the number of created resources is less than the allowed parameter, create and lend it. Else, wait for one of the lent resources to be returned.	2022-03-09 23:17:43 +05:30
Gian Merlino	3b373114dc	Officially support Java 11. (#12232 ) There aren't any changes in this patch that improve Java 11 compatibility; these changes have already been done separately. This patch merely updates documentation and explicit Java version checks. The log message adjustments in DruidProcessingConfig are there to make things a little nicer when running in Java 11, where we can't measure direct memory _directly_, and so we may auto-size processing buffers incorrectly.	2022-03-04 14:15:45 -08:00
Karan Kumar	a080fcdd7b	Fixing hadoop 3 Dockerfile (#12284 )	2022-02-26 19:18:29 +05:30
Suneet Saldanha	159f97dcb0	Update docs for druid.processing.numThreads in brokers (#12231 ) * Update docs for druid.processing.numThreads * error msg * one more reference	2022-02-04 17:34:21 -08:00
Laksh Singla	dc1703d5f9	Change value of `druid.sql.planner.useGroupingSetForExactDistinct` in common.runtime.properties (#12182 ) This PR changes the value of the property `druid.sql.planner.useGroupingSetForExactDistinct` from `false` to `true` in the runtime.properties files, so that newer installations have this property as `true`, while the default still remains as `false`. The flag determines how queries which contain an aggregation over `DISTINCT` like `SELECT COUNT(DISTINCT foo.dim1) FILTER(WHERE foo.cnt = 1), SUM(foo.cnt) FROM druid.foo` get planned by Calcite. With the flag being set to false, it plans it via joins, whereas with it being set to true, the query is set using grouping sets. There is a known issue with Calcite (https://github.com/apache/druid/issues/7953), where an NPE is thrown while planning the above query with joins. There is no such issue while planning the query using grouping sets.	2022-01-24 14:00:03 +05:30
Michka Popoff	590cf993c0	Replace source call to make scripts more portable (#12014 ) Fixes #10744 Fixes: ./bin/node.sh: 44: ./bin/node.sh: source: not found Could not find java - please run /opt/druid/apache-druid-0.20.0/bin/verify-java to confirm it is installed.	2021-12-06 13:41:25 +05:30
Frank Chen	4631a66723	Support rolling log files (#10147 ) * apply log file rolling strategy * fix doc Signed-off-by: frank chen <frank.chen021@outlook.com> * Use absolute log path and allow spaces in log path * Update log4j2 configuration * apply FileAppender to ZooKeeper * DO NOT redirect application's console log to file in supervisor	2021-12-03 21:32:01 +08:00
Clint Wylie	84b4bf56d8	vectorize logical operators and boolean functions (#11184 ) changes: * adds new config, druid.expressions.useStrictBooleans which make longs the official boolean type of all expressions * vectorize logical operators and boolean functions, some only if useStrictBooleans is true	2021-12-02 16:40:23 -08:00
Karan Kumar	90640bb316	Support for hadoop 3 via maven profiles (#11794 ) Add support for hadoop 3 profiles . Most of the details are captured in #11791 . We use a combination of maven profiles and resource filtering to achieve this. Hadoop2 is supported by default and a new maven profile with the name hadoop3 is created. This will allow the user to choose the profile which is best suited for the use case.	2021-10-30 22:46:24 +05:30
Daniel Koepke	497f2a1051	Allow spaces in java home. (#11407 ) Quote the $java_exec var in examples/bin/verify-java to support spaces in DRUID_JAVA_HOME/JAVA_HOME. At present, the steps before and after the version check properly quote the path, but the version check spuriously fails when pointing to a Java 8 install that has a space in its path.	2021-07-05 18:50:36 +05:30
Charles Smith	d69533dbd9	First refactor of compaction (#10935 ) * first pass compaction refactor. includes updated behavior for queryGranularity. removes duplicated doc * fix links, typos, some reorganization * fix spelling. TBD still there for work in progress * updates tutorial examples, adds more clarification around compaction use cases * add granularity spec to automatic compaction config * final edits * spelling fixes * apply suggestions from review * upadtes from review * last edits * move note * clarify null * fix links & spelling * latest review * edits to auto-compaction config * add back rollup * fix links & spelling * Update compaction.md add granularityspec to example	2021-03-24 11:41:44 -07:00
Vyatcheslav Mogilevsky	b0432be07a	Apache archive mirror (#10979 ) * Ability to use mirror of archive.apache.org * Ability to use mirror of archive.apache.org: documentation * Ability to use mirror of archive.apache.org: fix int test Dockerfile: missing COPY instruction	2021-03-11 09:07:51 -08:00
misqos	e684b83e29	Add the ability to supply client certificate to dsql comand line tool. (#10765 )	2021-02-11 20:16:47 -08:00
Harini Rajendran	c2e26d2e1c	Add status/selfDiscovered endpoint to indexer for self discovery of indexer (#10679 ) Added the status/selfDiscovered endpoint to indexer. Per the api-reference doc, all services support status/selfDiscovered endpoint. So this change would fix that expected behavior. Also added example config files for indexer process that can be used to spin up the indexer process.	2020-12-14 19:04:14 -08:00
Vyatcheslav Mogilevsky	5324785eac	integration tests fix: update base image for hadoop containers to centos 7 (#10638 ) LGTM	2020-12-08 11:00:51 -08:00
Atul Mohan	b6ad790dc7	Support combining inputsource for parallel ingestion (#10387 ) * Add combining inputsource * Fix documentation Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>	2020-09-15 16:25:35 -07:00
Atul Mohan	06539bc828	Set default server.maxsize to the sum of segment cache (#10255 ) * Default server.maxsize * Remove maxsize refs from config Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>	2020-08-10 09:21:22 -07:00
frank chen	646fa84d04	Support unit on byte-related properties (#10203 ) * support unit suffix on byte-related properties * add doc * change default value of byte-related properites in example files * fix coding style * fix doc * fix CI * suppress spelling errors * improve code according to comments * rename Bytes to HumanReadableBytes * add getBytesInInt to get value safely * improve doc * fix problem reported by CI * fix problem reported by CI * resolve code review comments * improve error message * improve code & doc according to comments * fix CI problem * improve doc * suppress spelling check errors	2020-07-31 09:58:48 +08:00
Gian Merlino	479c290fb9	Add QueryResource to log4j2 template. (#9735 )	2020-04-22 09:18:45 -07:00
Clint Wylie	4d277dbf99	Fix double count ssl connection metrics (#9594 ) * fix double counted jetty/numOpenConnections metric for ssl connections * tests * more better * style	2020-04-03 23:29:23 -07:00
Suneet Saldanha	af3337dac8	DruidInputSource can add new dimensions during re-ingestion (#9590 ) * WIP integration tests * Add integration test for ingestion with transformSpec * WIP almost working tests * Add ignored tests * checkstyle stuff * remove newPage from index task ingestion spec * more test cleanup * still not quite working * Actually disable the tests * working tests * fix codestyle * dont use junit in integration tests * actually fix the bug * fix checkstyle * bring index tests closer to reindex tests	2020-04-02 17:32:31 -07:00
Maytas Monsereenusorn	e9888f41cb	Modify check java version script to indicate experimental support for Java 11 (#9455 ) * Modify check java version script to indicate experimental support for Java 11 * update docs	2020-03-11 09:22:39 -07:00
Chi Cao Minh	26eeba636a	Make java version check work on all shells (#9376 ) * Make java version check work on all shells Previously, "perl verify-java" would fail on shells like zsh, which would cause the quickstart scripts (e.g., bin/start-micro-quickstart) to fail unless the DRUID_SKIP_JAVA_SKIP environment variable is set. * Support dash (ubuntu)	2020-02-19 13:44:00 -08:00
Clint Wylie	b55657cc26	fix protobuf extension packaging and docs (#9320 ) * fix protobuf extension packaging and docs * fix paths * Update protobuf.md * Update protobuf.md	2020-02-07 09:26:52 -08:00
Suneet Saldanha	180c622e0f	Minor doc updates (#9217 ) * update string first last aggs * update kafka ingestion specs in docs * remove unnecessary parser spec	2020-01-20 11:34:37 -08:00
Suneet Saldanha	85a3d416b0	Tutorials use new ingestion spec where possible (#9155 ) * Tutorials use new ingestion spec where possible There are 2 main changes * Use task type index_parallel instead of index * Remove the use of parser + firehose in favor of inputFormat + inputSource index_parallel is the preferred method starting in 0.17. Setting the job to index_parallel with the default maxNumConcurrentSubTasks(1) is the equivalent of an index task Instead of using a parserSpec, dimensionSpec and timestampSpec have been promoted to the dataSchema. The format is described in the ioConfig as the inputFormat. There are a few cases where the new format is not supported * Hadoop must use firehoses instead of the inputSource and inputFormat * There is no equivalent of a combining firehose as an inputSource * A Combining firehose does not support index_parallel * fix typo	2020-01-15 14:08:29 -08:00
Suneet Saldanha	3325da1718	Allow startup scripts to specify java home (#9021 ) * Allow startup scripts to specify java home The startup scripts now look for java in 3 locations. The order is from most related to druid to least, ie ${DRUID_JAVA_HOME} ${JAVA_HOME} ${PATH} * Update fn names and clean up code * final round of fixes * fix spellcheck	2019-12-12 21:36:00 -08:00
Gian Merlino	adb72fe8d5	Improve verify-default-ports to check both INADDR_ANY and 127.0.0.1. (#8942 )	2019-11-26 16:05:15 -08:00
Surekha	d628bebbd7	Make supervisor API similar to submit task API (#8810 ) * accept spec or dataSchema, tuningConfig, ioConfig while submitting task json * fix test * update docs * lgtm warning * Add original constructor back to IndexTask to minimize changes * fix indentation in docs * Allow spec to be specified in supervisor schema * undo IndexTask spec changes * update docs * Add Nullable and deprecated annotations * remove deprecated configs from SeekableStreamSupervisorSpec * remove nullable annotation	2019-11-20 10:04:41 -08:00
Gian Merlino	c44452f0c1	Tidy up lifecycle, query, and ingestion logging. (#8889 ) * Tidy up lifecycle, query, and ingestion logging. The goal of this patch is to improve the clarity and usefulness of Druid's logging for cluster operators. For more information, see https://twitter.com/cowtowncoder/status/1195469299814555648. Concretely, this patch does the following: - Changes a lot of INFO logs to DEBUG, and DEBUG to TRACE, with the goal of reducing redundancy and improving clarity by avoiding showing rarely-useful log messages. This includes most "starting" and "stopping" messages, and most messages related to individual columns. - Adds new log4j2 templates that show operators how to enabled DEBUG logging for certain important packages. - Eliminate stack traces for query errors, unless log level is DEBUG or more. This is useful because query errors often indicate user error rather than system error, but dumping stack trace often gave operators the impression that there was a system failure. - Adds task id to Appenderator, AppenderatorDriver thread names. In the default log4j2 configuration, this will put them in log lines as well. It's very useful if a user is using the Indexer, where multiple tasks run in the same JVM. - More consistent terminology when it comes to "sequences" (sets of segments that are handed-off together by Kafka ingestion) and "offsets" (cursors in partitions). These terms had been confused in some log messages due to the fact that Kinesis calls offsets "sequence numbers". - Replaces some ugly toString calls with either the JSONification or something more operator-accessible (like a URL or segment identifier, instead of JSON object representing the same). * Adjustments. * Adjust integration test.	2019-11-19 13:57:58 -08:00
Chi Cao Minh	8365bdf62a	Address security vulnerabilities (#8878 ) * Address security vulnerabilities Security vulnerabilities addressed by upgrading 3rd party libs: - Upgrade avro-ipc to 1.9.1 - sonatype-2019-0115 - Upgrade caffeine to 2.8.0 - sonatype-2019-0282 - Upgrade commons-beanutils to 1.9.4 - CVE-2014-0114 - Upgrade commons-codec to 1.13 - sonatype-2012-0050 - Upgrade commons-compress to 1.19 - CVE-2019-12402 - sonatype-2018-0293 - Upgrade hadoop-common to 2.8.5 - CVE-2018-11767 - Upgrade hadoop-mapreduce-client-core to 2.8.5 - CVE-2017-3166 - Upgrade hibernate-validator to 5.2.5 - CVE-2017-7536 - Upgrade httpclient to 4.5.10 - sonatype-2017-0359 - Upgrade icu4j to 55.1 - CVE-2014-8147 - Upgrade jackson-databind to 2.6.7.3: - CVE-2017-7525 - Upgrade jetty-http to 9.4.12: - CVE-2017-7657 - CVE-2017-7658 - CVE-2017-7656 - CVE-2018-12545 - Upgrade log4j-core to 2.8.2 - CVE-2017-5645: - Upgrade netty to 3.10.6 - CVE-2015-2156 - Upgrade netty-common to 4.1.42 - CVE-2019-9518 - Upgrade netty-codec-http to 4.1.42 - CVE-2019-16869 - Upgrade nimbus-jose-jwt to 4.41.1 - CVE-2017-12972 - CVE-2017-12974 - Upgrade plexus-utils to 3.0.24 - CVE-2017-1000487 - sonatype-2015-0173 - sonatype-2016-0398 - Upgrade postgresql to 42.2.8 - CVE-2018-10936 Note that if users are using JDBC lookups with postgres, they may need to update the JDBC jar used by the lookup extension. * Fix license for postgresql	2019-11-19 09:14:33 -08:00
Gian Merlino	e70b71c90f	Fix verify script. (#8798 ) Accidentally missed some quote escaping in #8794.	2019-10-30 23:30:01 -07:00
Gian Merlino	edb3b00d26	Startup scripts: verify Java 8 (exactly), improve port/java verification messages. (#8794 ) * Startup scripts: verify Java 8 (exactly), improve port/java verification messages. Java 11 compatibility isn't fully baked yet (users have reported various issues on Java 11), so block startup with an error message unless Java 8 is found. Allow overriding this decision with an environment variable. * Message adjustments.	2019-10-30 22:37:05 -07:00
Gian Merlino	7605c23354	Remove Tranquility configs and certain doc references. (#8793 ) Since it hasn't received updates or community interest in a while, it makes sense to de-emphasize it in the distribution and most documentation (outside of simple mentions of its existence).	2019-10-30 16:30:16 -07:00
Gian Merlino	c922d2c3c9	Use bundled ZooKeeper in tutorials. (#8792 )	2019-10-30 16:17:28 -07:00
Chi Cao Minh	ad615438f6	Fix Hadoop tutorial Dockerfile (#8753 ) When following the instructions from Hadoop batch load tutorial in the docs, building the Docker images fails with: gzip: stdin: unexpected end of file tar: Child returned status 1 tar: Error is not recoverable: exiting now Updating nss allows the curl command for downloading Hadoop to succeed while building the Docker image.	2019-10-25 17:58:40 -07:00
Benedict Jin	ec836ae8f8	Fix result of division may be truncated (#8355 ) * Fix result of division may be truncated * Refactoring * Patch comments	2019-09-06 14:45:52 -07:00
Jonathan Wei	c626452b47	Add nano-quickstart single server example configuration (#8390 ) * Add nano-quickstart single server example configuration * Use two workers * Shrink processing buffers	2019-08-24 22:07:20 -07:00
Clint Wylie	42a7b8849a	remove FirehoseV2 and realtime node extensions (#8020 ) * remove firehosev2 and realtime node extensions * revert intellij stuff * rat exclusion	2019-07-04 15:40:22 -07:00
Benedict Jin	8788849bab	Bump commons-validator from 1.4.0 to 1.5.1 (#7987 )	2019-06-27 22:00:49 -07:00
Clint Wylie	71997c16a2	switch links from druid.io to druid.apache.org (#7914 ) * switch links from druid.io to druid.apache.org * fix it	2019-06-18 09:06:27 -07:00
Clint Wylie	8117222da3	use right port for kafka tutorial, reinfoce that tutorials assume you are using micro-quickstart single-server configuration (#7862 )	2019-06-11 08:50:52 -07:00
Jihoon Son	7abfbb066a	Bump up snapshot version to 0.16.0 (#7802 )	2019-05-30 17:17:33 -07:00
Merlin Lee	26fad7e06a	Add checkstyle for "Local variable names shouldn't start with capital" (#7681 ) * Add checkstyle for "Local variable names shouldn't start with capital" * Adjust some local variables to constants * Replace StringUtils.LINE_SEPARATOR with System.lineSeparator()	2019-05-23 18:40:28 +02:00

1 2 3 4 5 ...

1182 Commits