Eliminates two common sources of noise with Kafka supervisors that have
large numbers of tasks and partitions:
1) Log the report at DEBUG rather than INFO level at each run cycle.
It can get quite large, and can be retrieved via API when needed.
2) Use log4j2.xml to quiet down the org.apache.kafka.clients.consumer.internals
package. Avoids a log message per-partition per-minute as part of seeking
to the latest offset in the reporting thread. In the tasks, where this
sort of logging might be more useful, we have another log message with
the same information: "Seeking partition[%s] to[%s]".
Druid currently uses Zookeeper dependent options as the default.
This commit updates the following to use HTTP as the default instead.
- task runner. `druid.indexer.runner.type=remote -> httpRemote`
- load queue peon. `druid.coordinator.loadqueuepeon.type=curator -> http`
- server inventory view. `druid.serverview.type=curator -> http`
* apache#12063 Ease of hidding sensitive properties from /status/properties endpoint
* apache#12063 Ease of hidding sensitive properties from /status/properties endpoint
* apache#12063 Ease of hidding sensitive properties from /status/properties endpoint
using one property for hiding properties, updated the index.md to document hiddenProperties
* apache#12063 Ease of hidding sensitive properties from /status/properties endpoint
Added java docs
* apache#12063 Ease of hidding sensitive properties from /status/properties endpoint
Add "password", "key", "token", "pwd" as default druid.server.hiddenProperties
fixed typo and removed redundant space
Co-authored-by: zemin <zemin.piao@adyen.com>
The docs say Java 17 support is experimental, and give tips on running
successfully with Java 17.
This patch also removes java.base/jdk.internal.perf and
jdk.management/com.sun.management.internal from the list of required
exports and opens, because they were formerly needed for JvmMonitor,
which was rewritten in #12481 to use MXBeans instead.
* changes to run examples on macos where cd command returns current directory
* Update examples/bin/run-druid
Co-authored-by: Frank Chen <frankchen@apache.org>
* merging
* sending output of cd command to /dev/null
Co-authored-by: Frank Chen <frankchen@apache.org>
* Improved Java 17 support and Java runtime docs.
1) Add a "Java runtime" doc page with information about supported
Java versions, garbage collection, and strong encapsulation..
2) Update asm and equalsverifier to versions that support Java 17.
3) Add additional "--add-opens" lines to surefire configuration, so
tests can pass successfully under Java 17.
4) Switch openjdk15 tests to openjdk17.
5) Update FrameFile to specifically mention Java runtime incompatibility
as the cause of not being able to use Memory.map.
6) Update SegmentLoadDropHandler to log an error for Errors too, not
just Exceptions. This is important because an IllegalAccessError is
encountered when the correct "--add-opens" line is not provided,
which would otherwise be silently ignored.
7) Update example configs to use druid.indexer.runner.javaOptsArray
instead of druid.indexer.runner.javaOpts. (The latter is deprecated.)
* Adjustments.
* Use run-java in more places.
* Add run-java.
* Update .gitignore.
* Exclude hadoop-client-api.
Brought in when building on Java 17.
* Swap one more usage of java.
* Fix the run-java script.
* Fix flag.
* Include link to Temurin.
* Spelling.
* Update examples/bin/run-java
Co-authored-by: Xavier Léauté <xl+github@xvrl.net>
Co-authored-by: Xavier Léauté <xl+github@xvrl.net>
* Python 3 support for post-index-task.
Useful when running on macOS or any other system that
doesn't have Python 2.
* Encode JSON returned by read_task_file.
* Adjust.
* Skip needless loads.
* Add a decode.
* Additional decodes needed.
* Add TIME_IN_INTERVAL SQL operator.
The operator is implemented as a convertlet rather than an
OperatorConversion, because this allows it to be equivalent to using
the >= and < operators directly.
* SqlParserPos cannot be null here.
* Remove unused import.
* Doc updates.
* Add words to dictionary.
* Service stdout log files, move logs to log/.
Two changes that make log behavior cleaner:
1) Redirect messages from the Java runtime to their own log files.
Otherwise, they would get jumbled up in the output of the all-in-one
start command.
2) Use log/ instead of bin/log/ for the default log directory. Makes them
easier to find.
Additionally, add documentation about how to avoid the reflective
access warnings in Java 11.
* Spelling.
* See if code formatting affects spelling.
Fix errors related to zulu8 installation for building the Hadoop Docker image in the Load From Apache Hadoop tutorial.
The steps to download zulu8 in the Dockerfile and setup-zulu-repo.sh were replaced with the steps in the Dockerfile released by zulu-openjdk: be45d20302/centos/8u282-8.52.0.23/Dockerfile.
Add config for eager / lazy connection initialization in ResourcePool
Description
Currently, when multiple tasks are launched, each of them eagerly initializes a full pool's worth of connections to the coordinator.
While this is acceptable when the parameter for number of eagerConnections (== maxSize) is small, this can be problematic in environments where it's a large value (say 1000) and multiple tasks are launched simultaneously, which can cause a large number of connections to be created to the coordinator, thereby overwhelming it.
Patch
Nodes like the broker may require eager initialization of resources and do not create connections with the Coordinator.
It is unnecessary to do this with other types of nodes.
A config parameter eagerInitialization is added, which when set to true, initializes the max permissible connections when ResourcePool is initialized.
If set to false, lazy initialization of connection resources takes place.
NOTE: All nodes except the broker have this new parameter set to false in the quickstart as part of this PR
Algorithm
The current implementation relies on the creation of maxSize resources eagerly.
The new implementation's behaviour is as follows:
If a resource has been previously created and is available, lend it.
Else if the number of created resources is less than the allowed parameter, create and lend it.
Else, wait for one of the lent resources to be returned.
There aren't any changes in this patch that improve Java 11
compatibility; these changes have already been done separately. This
patch merely updates documentation and explicit Java version checks.
The log message adjustments in DruidProcessingConfig are there to make
things a little nicer when running in Java 11, where we can't measure
direct memory _directly_, and so we may auto-size processing buffers
incorrectly.
This PR changes the value of the property `druid.sql.planner.useGroupingSetForExactDistinct` from `false` to `true` in the runtime.properties files, so that newer installations have this property as `true`, while the default still remains as `false`.
The flag determines how queries which contain an aggregation over `DISTINCT` like `SELECT COUNT(DISTINCT foo.dim1) FILTER(WHERE foo.cnt = 1), SUM(foo.cnt) FROM druid.foo` get planned by Calcite. With the flag being set to false, it plans it via joins, whereas with it being set to true, the query is set using grouping sets.
There is a known issue with Calcite (https://github.com/apache/druid/issues/7953), where an NPE is thrown while planning the above query with joins. There is no such issue while planning the query using grouping sets.
Fixes#10744
Fixes:
./bin/node.sh: 44: ./bin/node.sh: source: not found
Could not find java - please run /opt/druid/apache-druid-0.20.0/bin/verify-java to confirm it is installed.
* apply log file rolling strategy
* fix doc
Signed-off-by: frank chen <frank.chen021@outlook.com>
* Use absolute log path and allow spaces in log path
* Update log4j2 configuration
* apply FileAppender to ZooKeeper
* DO NOT redirect application's console log to file in supervisor
changes:
* adds new config, druid.expressions.useStrictBooleans which make longs the official boolean type of all expressions
* vectorize logical operators and boolean functions, some only if useStrictBooleans is true
Add support for hadoop 3 profiles . Most of the details are captured in #11791 .
We use a combination of maven profiles and resource filtering to achieve this. Hadoop2 is supported by default and a new maven profile with the name hadoop3 is created. This will allow the user to choose the profile which is best suited for the use case.
Quote the $java_exec var in examples/bin/verify-java to support spaces in DRUID_JAVA_HOME/JAVA_HOME. At present, the steps before and after the version check properly quote the path, but the version check spuriously fails when pointing to a Java 8 install that has a space in its path.
* first pass compaction refactor. includes updated behavior for queryGranularity. removes duplicated doc
* fix links, typos, some reorganization
* fix spelling. TBD still there for work in progress
* updates tutorial examples, adds more clarification around compaction use cases
* add granularity spec to automatic compaction config
* final edits
* spelling fixes
* apply suggestions from review
* upadtes from review
* last edits
* move note
* clarify null
* fix links & spelling
* latest review
* edits to auto-compaction config
* add back rollup
* fix links & spelling
* Update compaction.md
add granularityspec to example
* Ability to use mirror of archive.apache.org
* Ability to use mirror of archive.apache.org: documentation
* Ability to use mirror of archive.apache.org: fix int test Dockerfile: missing COPY instruction
Added the status/selfDiscovered endpoint to indexer. Per the api-reference doc, all services support status/selfDiscovered endpoint. So this change would fix that expected behavior.
Also added example config files for indexer process that can be used to spin up the indexer process.
* support unit suffix on byte-related properties
* add doc
* change default value of byte-related properites in example files
* fix coding style
* fix doc
* fix CI
* suppress spelling errors
* improve code according to comments
* rename Bytes to HumanReadableBytes
* add getBytesInInt to get value safely
* improve doc
* fix problem reported by CI
* fix problem reported by CI
* resolve code review comments
* improve error message
* improve code & doc according to comments
* fix CI problem
* improve doc
* suppress spelling check errors
* WIP integration tests
* Add integration test for ingestion with transformSpec
* WIP almost working tests
* Add ignored tests
* checkstyle stuff
* remove newPage from index task ingestion spec
* more test cleanup
* still not quite working
* Actually disable the tests
* working tests
* fix codestyle
* dont use junit in integration tests
* actually fix the bug
* fix checkstyle
* bring index tests closer to reindex tests
* Make java version check work on all shells
Previously, "perl verify-java" would fail on shells like zsh, which
would cause the quickstart scripts (e.g., bin/start-micro-quickstart) to
fail unless the DRUID_SKIP_JAVA_SKIP environment variable is set.
* Support dash (ubuntu)
* Tutorials use new ingestion spec where possible
There are 2 main changes
* Use task type index_parallel instead of index
* Remove the use of parser + firehose in favor of inputFormat + inputSource
index_parallel is the preferred method starting in 0.17. Setting the job to
index_parallel with the default maxNumConcurrentSubTasks(1) is the equivalent
of an index task
Instead of using a parserSpec, dimensionSpec and timestampSpec have been
promoted to the dataSchema. The format is described in the ioConfig as the
inputFormat.
There are a few cases where the new format is not supported
* Hadoop must use firehoses instead of the inputSource and inputFormat
* There is no equivalent of a combining firehose as an inputSource
* A Combining firehose does not support index_parallel
* fix typo
* Allow startup scripts to specify java home
The startup scripts now look for java in 3 locations. The order is from
most related to druid to least, ie
${DRUID_JAVA_HOME}
${JAVA_HOME}
${PATH}
* Update fn names and clean up code
* final round of fixes
* fix spellcheck
* Tidy up lifecycle, query, and ingestion logging.
The goal of this patch is to improve the clarity and usefulness of
Druid's logging for cluster operators. For more information, see
https://twitter.com/cowtowncoder/status/1195469299814555648.
Concretely, this patch does the following:
- Changes a lot of INFO logs to DEBUG, and DEBUG to TRACE, with the
goal of reducing redundancy and improving clarity by avoiding
showing rarely-useful log messages. This includes most "starting"
and "stopping" messages, and most messages related to individual
columns.
- Adds new log4j2 templates that show operators how to enabled DEBUG
logging for certain important packages.
- Eliminate stack traces for query errors, unless log level is DEBUG
or more. This is useful because query errors often indicate user
error rather than system error, but dumping stack trace often gave
operators the impression that there was a system failure.
- Adds task id to Appenderator, AppenderatorDriver thread names. In
the default log4j2 configuration, this will put them in log lines
as well. It's very useful if a user is using the Indexer, where
multiple tasks run in the same JVM.
- More consistent terminology when it comes to "sequences" (sets of
segments that are handed-off together by Kafka ingestion) and
"offsets" (cursors in partitions). These terms had been confused in
some log messages due to the fact that Kinesis calls offsets
"sequence numbers".
- Replaces some ugly toString calls with either the JSONification or
something more operator-accessible (like a URL or segment identifier,
instead of JSON object representing the same).
* Adjustments.
* Adjust integration test.
* Startup scripts: verify Java 8 (exactly), improve port/java verification messages.
Java 11 compatibility isn't fully baked yet (users have reported various
issues on Java 11), so block startup with an error message unless Java 8
is found. Allow overriding this decision with an environment variable.
* Message adjustments.
Since it hasn't received updates or community interest in a while, it makes sense
to de-emphasize it in the distribution and most documentation (outside of simple
mentions of its existence).