Commit Graph

1166 Commits

Author SHA1 Message Date
Charles Smith d69533dbd9
First refactor of compaction (#10935)
* first pass compaction refactor. includes updated behavior for queryGranularity. removes duplicated doc

* fix links, typos, some reorganization

* fix spelling. TBD still there for work in progress

* updates tutorial examples, adds more clarification around compaction use cases

* add granularity spec to automatic compaction config

* final edits

* spelling fixes

* apply suggestions from review

* upadtes from review

* last edits

* move note

* clarify null

* fix links & spelling

* latest review

* edits to auto-compaction config

* add back rollup

* fix links & spelling

* Update compaction.md

add granularityspec to example
2021-03-24 11:41:44 -07:00
Vyatcheslav Mogilevsky b0432be07a
Apache archive mirror (#10979)
* Ability to use mirror of archive.apache.org

* Ability to use mirror of archive.apache.org: documentation

* Ability to use mirror of archive.apache.org: fix int test Dockerfile: missing COPY instruction
2021-03-11 09:07:51 -08:00
misqos e684b83e29
Add the ability to supply client certificate to dsql comand line tool. (#10765) 2021-02-11 20:16:47 -08:00
Harini Rajendran c2e26d2e1c
Add status/selfDiscovered endpoint to indexer for self discovery of indexer (#10679)
Added the status/selfDiscovered endpoint to indexer. Per the api-reference doc, all services support status/selfDiscovered endpoint. So this change would fix that expected behavior.

Also added example config files for indexer process that can be used to spin up the indexer process.
2020-12-14 19:04:14 -08:00
Vyatcheslav Mogilevsky 5324785eac
integration tests fix: update base image for hadoop containers to centos 7 (#10638)
LGTM
2020-12-08 11:00:51 -08:00
Atul Mohan b6ad790dc7
Support combining inputsource for parallel ingestion (#10387)
* Add combining inputsource

* Fix documentation

Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2020-09-15 16:25:35 -07:00
Atul Mohan 06539bc828
Set default server.maxsize to the sum of segment cache (#10255)
* Default server.maxsize

* Remove maxsize refs from config

Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2020-08-10 09:21:22 -07:00
frank chen 646fa84d04
Support unit on byte-related properties (#10203)
* support unit suffix on byte-related properties

* add doc

* change default value of byte-related properites in example files

* fix coding style

* fix doc

* fix CI

* suppress spelling errors

* improve code according to comments

* rename Bytes to HumanReadableBytes

* add getBytesInInt to get value safely

* improve doc

* fix problem reported by CI

* fix problem reported by CI

* resolve code review comments

* improve error message

* improve code & doc according to comments

* fix CI problem

* improve doc

* suppress spelling check errors
2020-07-31 09:58:48 +08:00
Gian Merlino 479c290fb9
Add QueryResource to log4j2 template. (#9735) 2020-04-22 09:18:45 -07:00
Clint Wylie 4d277dbf99
Fix double count ssl connection metrics (#9594)
* fix double counted jetty/numOpenConnections metric for ssl connections

* tests

* more better

* style
2020-04-03 23:29:23 -07:00
Suneet Saldanha af3337dac8
DruidInputSource can add new dimensions during re-ingestion (#9590)
* WIP integration tests

* Add integration test for ingestion with transformSpec

* WIP almost working tests

* Add ignored tests

* checkstyle stuff

* remove newPage from index task ingestion spec

* more test cleanup

* still not quite working

* Actually disable the tests

* working tests

* fix codestyle

* dont use junit in integration tests

* actually fix the bug

* fix checkstyle

* bring index tests closer to reindex tests
2020-04-02 17:32:31 -07:00
Maytas Monsereenusorn e9888f41cb
Modify check java version script to indicate experimental support for Java 11 (#9455)
* Modify check java version script to indicate experimental support for Java 11

* update docs
2020-03-11 09:22:39 -07:00
Chi Cao Minh 26eeba636a
Make java version check work on all shells (#9376)
* Make java version check work on all shells

Previously, "perl verify-java" would fail on shells like zsh, which
would cause the quickstart scripts (e.g., bin/start-micro-quickstart) to
fail unless the DRUID_SKIP_JAVA_SKIP environment variable is set.

* Support dash (ubuntu)
2020-02-19 13:44:00 -08:00
Clint Wylie b55657cc26
fix protobuf extension packaging and docs (#9320)
* fix protobuf extension packaging and docs

* fix paths

* Update protobuf.md

* Update protobuf.md
2020-02-07 09:26:52 -08:00
Suneet Saldanha 180c622e0f Minor doc updates (#9217)
* update string first last aggs

* update kafka ingestion specs in docs

* remove unnecessary parser spec
2020-01-20 11:34:37 -08:00
Suneet Saldanha 85a3d416b0 Tutorials use new ingestion spec where possible (#9155)
* Tutorials use new ingestion spec where possible

There are 2 main changes
  * Use task type index_parallel instead of index
  * Remove the use of parser + firehose in favor of inputFormat + inputSource

index_parallel is the preferred method starting in 0.17. Setting the job to
index_parallel with the default maxNumConcurrentSubTasks(1) is the equivalent
of an index task

Instead of using a parserSpec, dimensionSpec and timestampSpec have been
promoted to the dataSchema. The format is described in the ioConfig as the
inputFormat.

There are a few cases where the new format is not supported
 * Hadoop must use firehoses instead of the inputSource and inputFormat
 * There is no equivalent of a combining firehose as an inputSource
 * A Combining firehose does not support index_parallel

* fix typo
2020-01-15 14:08:29 -08:00
Suneet Saldanha 3325da1718 Allow startup scripts to specify java home (#9021)
* Allow startup scripts to specify java home

The startup scripts now look for java in 3 locations. The order is from
most related to druid to least, ie
    ${DRUID_JAVA_HOME}
    ${JAVA_HOME}
    ${PATH}

* Update fn names and clean up code

* final round of fixes

* fix spellcheck
2019-12-12 21:36:00 -08:00
Gian Merlino adb72fe8d5 Improve verify-default-ports to check both INADDR_ANY and 127.0.0.1. (#8942) 2019-11-26 16:05:15 -08:00
Surekha d628bebbd7 Make supervisor API similar to submit task API (#8810)
* accept spec or dataSchema, tuningConfig, ioConfig while submitting task json

* fix test

* update docs

* lgtm warning

* Add original constructor back to IndexTask to minimize changes

* fix indentation in docs

* Allow spec to be specified in supervisor schema

* undo IndexTask spec changes

* update docs

* Add Nullable and deprecated annotations

* remove deprecated configs from SeekableStreamSupervisorSpec

* remove nullable annotation
2019-11-20 10:04:41 -08:00
Gian Merlino c44452f0c1 Tidy up lifecycle, query, and ingestion logging. (#8889)
* Tidy up lifecycle, query, and ingestion logging.

The goal of this patch is to improve the clarity and usefulness of
Druid's logging for cluster operators. For more information, see
https://twitter.com/cowtowncoder/status/1195469299814555648.

Concretely, this patch does the following:

- Changes a lot of INFO logs to DEBUG, and DEBUG to TRACE, with the
  goal of reducing redundancy and improving clarity by avoiding
  showing rarely-useful log messages. This includes most "starting"
  and "stopping" messages, and most messages related to individual
  columns.
- Adds new log4j2 templates that show operators how to enabled DEBUG
  logging for certain important packages.
- Eliminate stack traces for query errors, unless log level is DEBUG
  or more. This is useful because query errors often indicate user
  error rather than system error, but dumping stack trace often gave
  operators the impression that there was a system failure.
- Adds task id to Appenderator, AppenderatorDriver thread names. In
  the default log4j2 configuration, this will put them in log lines
  as well. It's very useful if a user is using the Indexer, where
  multiple tasks run in the same JVM.
- More consistent terminology when it comes to "sequences" (sets of
  segments that are handed-off together by Kafka ingestion) and
  "offsets" (cursors in partitions). These terms had been confused in
  some log messages due to the fact that Kinesis calls offsets
  "sequence numbers".
- Replaces some ugly toString calls with either the JSONification or
  something more operator-accessible (like a URL or segment identifier,
  instead of JSON object representing the same).

* Adjustments.

* Adjust integration test.
2019-11-19 13:57:58 -08:00
Chi Cao Minh 8365bdf62a Address security vulnerabilities (#8878)
* Address security vulnerabilities

Security vulnerabilities addressed by upgrading 3rd party libs:

- Upgrade avro-ipc to 1.9.1
  - sonatype-2019-0115
- Upgrade caffeine to 2.8.0
  - sonatype-2019-0282
- Upgrade commons-beanutils to 1.9.4
  - CVE-2014-0114
- Upgrade commons-codec to 1.13
  - sonatype-2012-0050
- Upgrade commons-compress to 1.19
  - CVE-2019-12402
  - sonatype-2018-0293
- Upgrade hadoop-common to 2.8.5
  - CVE-2018-11767
- Upgrade hadoop-mapreduce-client-core to 2.8.5
  - CVE-2017-3166
- Upgrade hibernate-validator to 5.2.5
  - CVE-2017-7536
- Upgrade httpclient to 4.5.10
  - sonatype-2017-0359
- Upgrade icu4j to 55.1
  - CVE-2014-8147
- Upgrade jackson-databind to 2.6.7.3:
  - CVE-2017-7525
- Upgrade jetty-http to 9.4.12:
  - CVE-2017-7657
  - CVE-2017-7658
  - CVE-2017-7656
  - CVE-2018-12545
- Upgrade log4j-core to 2.8.2
  - CVE-2017-5645:
- Upgrade netty to 3.10.6
  - CVE-2015-2156
- Upgrade netty-common to 4.1.42
  - CVE-2019-9518
- Upgrade netty-codec-http to 4.1.42
  - CVE-2019-16869
- Upgrade nimbus-jose-jwt to 4.41.1
  - CVE-2017-12972
  - CVE-2017-12974
- Upgrade plexus-utils to 3.0.24
  - CVE-2017-1000487
  - sonatype-2015-0173
  - sonatype-2016-0398
- Upgrade postgresql to 42.2.8
  - CVE-2018-10936

Note that if users are using JDBC lookups with postgres, they may need
to update the JDBC jar used by the lookup extension.

* Fix license for postgresql
2019-11-19 09:14:33 -08:00
Gian Merlino e70b71c90f
Fix verify script. (#8798)
Accidentally missed some quote escaping in #8794.
2019-10-30 23:30:01 -07:00
Gian Merlino edb3b00d26 Startup scripts: verify Java 8 (exactly), improve port/java verification messages. (#8794)
* Startup scripts: verify Java 8 (exactly), improve port/java verification messages.

Java 11 compatibility isn't fully baked yet (users have reported various
issues on Java 11), so block startup with an error message unless Java 8
is found. Allow overriding this decision with an environment variable.

* Message adjustments.
2019-10-30 22:37:05 -07:00
Gian Merlino 7605c23354 Remove Tranquility configs and certain doc references. (#8793)
Since it hasn't received updates or community interest in a while, it makes sense
to de-emphasize it in the distribution and most documentation (outside of simple
mentions of its existence).
2019-10-30 16:30:16 -07:00
Gian Merlino c922d2c3c9 Use bundled ZooKeeper in tutorials. (#8792) 2019-10-30 16:17:28 -07:00
Chi Cao Minh ad615438f6 Fix Hadoop tutorial Dockerfile (#8753)
When following the instructions from Hadoop batch load tutorial in the
docs, building the Docker images fails with:

  gzip: stdin: unexpected end of file
  tar: Child returned status 1
  tar: Error is not recoverable: exiting now

Updating nss allows the curl command for downloading Hadoop to succeed
while building the Docker image.
2019-10-25 17:58:40 -07:00
Benedict Jin ec836ae8f8 Fix result of division may be truncated (#8355)
* Fix result of division may be truncated

* Refactoring

* Patch comments
2019-09-06 14:45:52 -07:00
Jonathan Wei c626452b47 Add nano-quickstart single server example configuration (#8390)
* Add nano-quickstart single server example configuration

* Use two workers

* Shrink processing buffers
2019-08-24 22:07:20 -07:00
Clint Wylie 42a7b8849a remove FirehoseV2 and realtime node extensions (#8020)
* remove firehosev2 and realtime node extensions

* revert intellij stuff

* rat exclusion
2019-07-04 15:40:22 -07:00
Benedict Jin 8788849bab Bump commons-validator from 1.4.0 to 1.5.1 (#7987) 2019-06-27 22:00:49 -07:00
Clint Wylie 71997c16a2 switch links from druid.io to druid.apache.org (#7914)
* switch links from druid.io to druid.apache.org

* fix it
2019-06-18 09:06:27 -07:00
Clint Wylie 8117222da3 use right port for kafka tutorial, reinfoce that tutorials assume you are using micro-quickstart single-server configuration (#7862) 2019-06-11 08:50:52 -07:00
Jihoon Son 7abfbb066a Bump up snapshot version to 0.16.0 (#7802) 2019-05-30 17:17:33 -07:00
Merlin Lee 26fad7e06a Add checkstyle for "Local variable names shouldn't start with capital" (#7681)
* Add checkstyle for "Local variable names shouldn't start with capital"

* Adjust some local variables to constants

* Replace StringUtils.LINE_SEPARATOR with System.lineSeparator()
2019-05-23 18:40:28 +02:00
Gian Merlino b6941551ae Upgrade various build and doc links to https. (#7722)
* Upgrade various build and doc links to https.

Where it wasn't possible to upgrade build-time dependencies to https,
I kept http in place but used hardcoded checksums or GPG keys to ensure
that artifacts fetched over http are verified properly.

* Switch to https://apache.org.
2019-05-21 11:30:14 -07:00
Merlin Lee 5f08b0b474 Add checkstyle for "Prohibit @author tags in Javadoc" (#7682)
* Add checkstyle for "Prohibit @author tags in Javadoc"

* Add "Do not use author tags/information in the code" back to CONTRIBUTING.md
2019-05-20 00:09:51 -07:00
Jonathan Wei d667655871 Add basic tuning guide, getting started page, updated clustering docs (#7629)
* Add basic tuning guide, getting started page, updated clustering docs

* Add note about caching, fix tutorial paths

* Adjust hadoop wording

* Add license

* Tweak

* Shrink overlord heaps, fix tutorial urls

* Tweak xlarge peon, update peon sizing

* Update Data peon buffer size

* Fix cluster start scripts

* Add upper level _common to classpath

* Fix cluster data/query confs

* Address PR comments

* Elaborate on connection pools

* PR comments

* Increase druid.broker.http.maxQueuedBytes

* Add guidelines for broker backpressure

* PR comments
2019-05-16 11:13:48 -07:00
Surekha 917106985f Update tutorial to delete data (#7577)
* Update tutorial to delete data

* update tutorial, remove old ways to drop data

* PR comments
2019-05-15 14:40:06 -07:00
Jonathan Wei 7c2ca474da Add single-machine deployment example cfgs and scripts (#7590)
* Add single-machine deployment example cfgs and scripts

* Add (8u92+)

* Use combined coordinator-overlord for single machine confs

* RAT fix
2019-05-06 19:11:13 -07:00
Jihoon Son 892d1d35d6
Deprecate NoneShardSpec and drop support for automatic segment merge (#6883)
* Deprecate noneShardSpec

* clean up noneShardSpec constructor

* revert unnecessary change

* Deprecate mergeTask

* add more doc

* remove convert from indexMerger

* Remove mergeTask

* remove HadoopDruidConverterConfig

* fix build

* fix build

* fix teamcity

* fix teamcity

* fix ServerModule

* fix compilation

* fix compilation
2019-03-15 23:29:25 -07:00
Jonathan Wei 5486c2abf8
Update LICENSE and NOTICE files (#7026)
* Update LICENSE and NOTICE files

* Update react-table version
2019-03-04 18:45:22 -08:00
Justin Borromeo c7082ba36e Added friendlier dsql error message for 405 (which occurs when druid.sql.enabled=false) (#7112)
* Added friendlier error message for dsql 405

* no extra char

* Changed error message

* fixed weird spacing
2019-02-27 20:40:30 -08:00
Jonathan Wei 3d247498ef Update tutorials for 0.14.0-incubating (#7157) 2019-02-27 19:50:31 -08:00
Jihoon Son 6b232d8195 Improve compaction tutorial to demonstrate compaction with keepSegmentGranularity = true (#7079)
* Improve compaction tutorial to demonstrate compaction with keepSegmentGranularity = true

* typo

* add a warning
2019-02-27 16:02:51 -08:00
Jonathan Wei fafbc4a80e
Set version to 0.15.0-incubating-SNAPSHOT (#7014) 2019-02-07 14:02:52 -08:00
Jonathan Wei 8bc5eaa908
Set version to 0.14.0-incubating-SNAPSHOT (#7003) 2019-02-04 19:36:20 -08:00
Gian Merlino 4e426327bb Some adjustments to config examples. (#6973)
* Some adjustments to config examples.

- Add ExitOnOutOfMemoryError to jvm.config examples. It was added a
pretty long time ago (8u92) and is helpful since it prevents zombie
processes from hanging around. (OOMEs tend to bork things)
- Disable Broker caching and enable it on Historicals in example
configs. This config tends to scale better since it enables the
Historicals to merge results rather than sending everything by-segment
to the Broker. Also switch to "caffeine" cache from "local".
- Increase concurrency a bit for Broker example config.
- Enable SQL in the example config, a baby step towards making SQL
more of a thing. (It's still off by default in the code.)
- Reduce memory use a bit for the quickstart configs.
- Add example Router configs, in case someone wants to use that. One
reason might be to get the fancy new console (#6923).

* Add example Router configs.

* Fix up router example properties.

* Add router to quickstart supervise conf.
2019-01-31 17:59:39 -08:00
Gian Merlino ac4c7e21a2 Enhancements to dsql. (#6929)
- CLI history, basic autocomplete through deadline.
- Include timeout in query context.
- Group CLI options into... groups.
2019-01-28 17:02:43 -08:00
Gian Merlino ba33bdc497 Add exclusions to limit doubling up on jars. (#6927) 2019-01-28 11:06:30 -08:00
Jihoon Son c35a39d70b
Add support maxRowsPerSegment for auto compaction (#6780)
* Add support maxRowsPerSegment for auto compaction

* fix build

* fix build

* fix teamcity

* add test

* fix test

* address comment
2019-01-10 09:50:14 -08:00