Commit Graph

43 Commits

Author SHA1 Message Date
Paul Rogers a580aca551
Python Druid API for use in notebooks (#13787)
Python Druid API for use in notebooks

Revises existing notebooks and readme to reference
the new API.

Notebook to explain the new API.

Split README into a console version and a notebook
version to work around lack of a nice display for
md files.

Update the REST API notebook to use simpler Requests calls

Converted the SQL tutorial to use the Python library

README file, converted to using properties

---------

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2023-03-04 18:25:19 -08:00
Katya Macedo 1595653e6f
docs: add a link for the Druid SQL tutorial (#13468)
* docs: add juptyer API tutorial for API and jupyter tutorial index (#3)

(cherry picked from commit aeb8d9e3390fa26d9c533dce0862295b80c58583)

* update prereqs and fix jupyterlab name

* Removing notebook since 13345 has it

13345 should be merged first

* update contributing instructions

* docs: link to the  Druid SQL tutorial

* Add link to partitioning

* fix merge conflict

* Saving

* Update docs/tutorials/tutorial-jupyter-index.md

* Remove partitioning

---------

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Co-authored-by: brian.le <brian.le@imply.io>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-02-22 09:36:13 -08:00
Katya Macedo 58d9720b00
docs: notebook only for SQL tutorial (#13465)
CI Failures seem unrelated to docs

* docs: notebook only for SQL tutorial

* Update logical operators section

* Fix typo

* Adopt review suggestions

* Update examples/quickstart/jupyter-notebooks/sql-tutorial.ipynb

* Update examples, add link to keywords

* Update after review

* Update per review comments

* Add links
2023-02-08 20:04:53 -08:00
317brian d9c27d6102
docs: add index page and related stuff for jupyter tutorials (#13342) 2022-12-16 13:33:50 -08:00
317brian 668d1fad6b
docs: notebook only for API tutorial (#13345)
* docs: notebook for API tutorial

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* address the other comments

* typo

* add commentary to outputs

* address feedback from will

* delete unnecessary comment

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2022-12-15 13:16:07 -08:00
Charles Smith efbb58e90e
docs: remove maxRowsPerSegment where appropriate (#12071)
* remove maxRowsPerSegment where appropriate

* fix tutorial, accept suggestions

* Update docs/design/coordinator.md

* additional tutorial file

* fix initial index spec

* accept comments

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* add back comment on maxrows per segment

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* rm duplicate entry

* Update native-batch-simple-task.md

remove ref to `maxrowspersegment`

* Update native-batch.md

remove ref to `maxrowspersegment`

* final tenticles

* Apply suggestions from code review

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2022-07-28 16:52:13 +05:30
Gian Merlino 0099940808
Add TIME_IN_INTERVAL SQL operator. (#12662)
* Add TIME_IN_INTERVAL SQL operator.

The operator is implemented as a convertlet rather than an
OperatorConversion, because this allows it to be equivalent to using
the >= and < operators directly.

* SqlParserPos cannot be null here.

* Remove unused import.

* Doc updates.

* Add words to dictionary.
2022-06-21 13:05:37 -07:00
Tiffany Yeh 665c926824
Fix zulu8 set-up Dockerfile for hadoop and hadoop3 in hadoop ingestion tutorial (#12248)
Fix errors related to zulu8 installation for building the Hadoop Docker image in the Load From Apache Hadoop tutorial.

The steps to download zulu8 in the Dockerfile and setup-zulu-repo.sh were replaced with the steps in the Dockerfile released by zulu-openjdk: be45d20302/centos/8u282-8.52.0.23/Dockerfile.
2022-04-11 20:28:09 +05:30
Karan Kumar a080fcdd7b
Fixing hadoop 3 Dockerfile (#12284) 2022-02-26 19:18:29 +05:30
Karan Kumar 90640bb316
Support for hadoop 3 via maven profiles (#11794)
Add support for hadoop 3 profiles . Most of the details are captured in #11791 .
We use a combination of maven profiles and resource filtering to achieve this. Hadoop2 is supported by default and a new maven profile with the name hadoop3 is created. This will allow the user to choose the profile which is best suited for the use case.
2021-10-30 22:46:24 +05:30
Charles Smith d69533dbd9
First refactor of compaction (#10935)
* first pass compaction refactor. includes updated behavior for queryGranularity. removes duplicated doc

* fix links, typos, some reorganization

* fix spelling. TBD still there for work in progress

* updates tutorial examples, adds more clarification around compaction use cases

* add granularity spec to automatic compaction config

* final edits

* spelling fixes

* apply suggestions from review

* upadtes from review

* last edits

* move note

* clarify null

* fix links & spelling

* latest review

* edits to auto-compaction config

* add back rollup

* fix links & spelling

* Update compaction.md

add granularityspec to example
2021-03-24 11:41:44 -07:00
Vyatcheslav Mogilevsky b0432be07a
Apache archive mirror (#10979)
* Ability to use mirror of archive.apache.org

* Ability to use mirror of archive.apache.org: documentation

* Ability to use mirror of archive.apache.org: fix int test Dockerfile: missing COPY instruction
2021-03-11 09:07:51 -08:00
Vyatcheslav Mogilevsky 5324785eac
integration tests fix: update base image for hadoop containers to centos 7 (#10638)
LGTM
2020-12-08 11:00:51 -08:00
Atul Mohan b6ad790dc7
Support combining inputsource for parallel ingestion (#10387)
* Add combining inputsource

* Fix documentation

Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2020-09-15 16:25:35 -07:00
Suneet Saldanha af3337dac8
DruidInputSource can add new dimensions during re-ingestion (#9590)
* WIP integration tests

* Add integration test for ingestion with transformSpec

* WIP almost working tests

* Add ignored tests

* checkstyle stuff

* remove newPage from index task ingestion spec

* more test cleanup

* still not quite working

* Actually disable the tests

* working tests

* fix codestyle

* dont use junit in integration tests

* actually fix the bug

* fix checkstyle

* bring index tests closer to reindex tests
2020-04-02 17:32:31 -07:00
Clint Wylie b55657cc26
fix protobuf extension packaging and docs (#9320)
* fix protobuf extension packaging and docs

* fix paths

* Update protobuf.md

* Update protobuf.md
2020-02-07 09:26:52 -08:00
Suneet Saldanha 180c622e0f Minor doc updates (#9217)
* update string first last aggs

* update kafka ingestion specs in docs

* remove unnecessary parser spec
2020-01-20 11:34:37 -08:00
Suneet Saldanha 85a3d416b0 Tutorials use new ingestion spec where possible (#9155)
* Tutorials use new ingestion spec where possible

There are 2 main changes
  * Use task type index_parallel instead of index
  * Remove the use of parser + firehose in favor of inputFormat + inputSource

index_parallel is the preferred method starting in 0.17. Setting the job to
index_parallel with the default maxNumConcurrentSubTasks(1) is the equivalent
of an index task

Instead of using a parserSpec, dimensionSpec and timestampSpec have been
promoted to the dataSchema. The format is described in the ioConfig as the
inputFormat.

There are a few cases where the new format is not supported
 * Hadoop must use firehoses instead of the inputSource and inputFormat
 * There is no equivalent of a combining firehose as an inputSource
 * A Combining firehose does not support index_parallel

* fix typo
2020-01-15 14:08:29 -08:00
Surekha d628bebbd7 Make supervisor API similar to submit task API (#8810)
* accept spec or dataSchema, tuningConfig, ioConfig while submitting task json

* fix test

* update docs

* lgtm warning

* Add original constructor back to IndexTask to minimize changes

* fix indentation in docs

* Allow spec to be specified in supervisor schema

* undo IndexTask spec changes

* update docs

* Add Nullable and deprecated annotations

* remove deprecated configs from SeekableStreamSupervisorSpec

* remove nullable annotation
2019-11-20 10:04:41 -08:00
Chi Cao Minh 8365bdf62a Address security vulnerabilities (#8878)
* Address security vulnerabilities

Security vulnerabilities addressed by upgrading 3rd party libs:

- Upgrade avro-ipc to 1.9.1
  - sonatype-2019-0115
- Upgrade caffeine to 2.8.0
  - sonatype-2019-0282
- Upgrade commons-beanutils to 1.9.4
  - CVE-2014-0114
- Upgrade commons-codec to 1.13
  - sonatype-2012-0050
- Upgrade commons-compress to 1.19
  - CVE-2019-12402
  - sonatype-2018-0293
- Upgrade hadoop-common to 2.8.5
  - CVE-2018-11767
- Upgrade hadoop-mapreduce-client-core to 2.8.5
  - CVE-2017-3166
- Upgrade hibernate-validator to 5.2.5
  - CVE-2017-7536
- Upgrade httpclient to 4.5.10
  - sonatype-2017-0359
- Upgrade icu4j to 55.1
  - CVE-2014-8147
- Upgrade jackson-databind to 2.6.7.3:
  - CVE-2017-7525
- Upgrade jetty-http to 9.4.12:
  - CVE-2017-7657
  - CVE-2017-7658
  - CVE-2017-7656
  - CVE-2018-12545
- Upgrade log4j-core to 2.8.2
  - CVE-2017-5645:
- Upgrade netty to 3.10.6
  - CVE-2015-2156
- Upgrade netty-common to 4.1.42
  - CVE-2019-9518
- Upgrade netty-codec-http to 4.1.42
  - CVE-2019-16869
- Upgrade nimbus-jose-jwt to 4.41.1
  - CVE-2017-12972
  - CVE-2017-12974
- Upgrade plexus-utils to 3.0.24
  - CVE-2017-1000487
  - sonatype-2015-0173
  - sonatype-2016-0398
- Upgrade postgresql to 42.2.8
  - CVE-2018-10936

Note that if users are using JDBC lookups with postgres, they may need
to update the JDBC jar used by the lookup extension.

* Fix license for postgresql
2019-11-19 09:14:33 -08:00
Chi Cao Minh ad615438f6 Fix Hadoop tutorial Dockerfile (#8753)
When following the instructions from Hadoop batch load tutorial in the
docs, building the Docker images fails with:

  gzip: stdin: unexpected end of file
  tar: Child returned status 1
  tar: Error is not recoverable: exiting now

Updating nss allows the curl command for downloading Hadoop to succeed
while building the Docker image.
2019-10-25 17:58:40 -07:00
Clint Wylie 8117222da3 use right port for kafka tutorial, reinfoce that tutorials assume you are using micro-quickstart single-server configuration (#7862) 2019-06-11 08:50:52 -07:00
Gian Merlino b6941551ae Upgrade various build and doc links to https. (#7722)
* Upgrade various build and doc links to https.

Where it wasn't possible to upgrade build-time dependencies to https,
I kept http in place but used hardcoded checksums or GPG keys to ensure
that artifacts fetched over http are verified properly.

* Switch to https://apache.org.
2019-05-21 11:30:14 -07:00
Surekha 917106985f Update tutorial to delete data (#7577)
* Update tutorial to delete data

* update tutorial, remove old ways to drop data

* PR comments
2019-05-15 14:40:06 -07:00
Jonathan Wei 7c2ca474da Add single-machine deployment example cfgs and scripts (#7590)
* Add single-machine deployment example cfgs and scripts

* Add (8u92+)

* Use combined coordinator-overlord for single machine confs

* RAT fix
2019-05-06 19:11:13 -07:00
Jihoon Son 892d1d35d6
Deprecate NoneShardSpec and drop support for automatic segment merge (#6883)
* Deprecate noneShardSpec

* clean up noneShardSpec constructor

* revert unnecessary change

* Deprecate mergeTask

* add more doc

* remove convert from indexMerger

* Remove mergeTask

* remove HadoopDruidConverterConfig

* fix build

* fix build

* fix teamcity

* fix teamcity

* fix ServerModule

* fix compilation

* fix compilation
2019-03-15 23:29:25 -07:00
Jonathan Wei 5486c2abf8
Update LICENSE and NOTICE files (#7026)
* Update LICENSE and NOTICE files

* Update react-table version
2019-03-04 18:45:22 -08:00
Jonathan Wei 3d247498ef Update tutorials for 0.14.0-incubating (#7157) 2019-02-27 19:50:31 -08:00
Jihoon Son 6b232d8195 Improve compaction tutorial to demonstrate compaction with keepSegmentGranularity = true (#7079)
* Improve compaction tutorial to demonstrate compaction with keepSegmentGranularity = true

* typo

* add a warning
2019-02-27 16:02:51 -08:00
Gian Merlino 4e426327bb Some adjustments to config examples. (#6973)
* Some adjustments to config examples.

- Add ExitOnOutOfMemoryError to jvm.config examples. It was added a
pretty long time ago (8u92) and is helpful since it prevents zombie
processes from hanging around. (OOMEs tend to bork things)
- Disable Broker caching and enable it on Historicals in example
configs. This config tends to scale better since it enables the
Historicals to merge results rather than sending everything by-segment
to the Broker. Also switch to "caffeine" cache from "local".
- Increase concurrency a bit for Broker example config.
- Enable SQL in the example config, a baby step towards making SQL
more of a thing. (It's still off by default in the code.)
- Reduce memory use a bit for the quickstart configs.
- Add example Router configs, in case someone wants to use that. One
reason might be to get the fancy new console (#6923).

* Add example Router configs.

* Fix up router example properties.

* Add router to quickstart supervise conf.
2019-01-31 17:59:39 -08:00
Jihoon Son c35a39d70b
Add support maxRowsPerSegment for auto compaction (#6780)
* Add support maxRowsPerSegment for auto compaction

* fix build

* fix build

* fix teamcity

* add test

* fix test

* address comment
2019-01-10 09:50:14 -08:00
David Lim b332021c49 remove extensions from default configs that have configuration/library dependencies and update docs (#6694) 2018-11-30 12:52:46 -08:00
David Lim afb239b17a add missing license headers, in particular to MD files; clean up RAT … (#6563)
* add missing license headers, in particular to MD files; clean up RAT exclusions

* revert inadvertent doc changes

* docs

* cr changes

* fix modified druid-production.svg
2018-11-13 09:38:37 -08:00
David Lim 23ad3d214c fixup docs to download from Apache mirror, fixup tarball name and path, change references from quickstart/* to quickstart/tutorial/* (#6570) 2018-11-01 21:47:29 -07:00
David Lim 1e913a0416 update two license headers to ASF (#6449) 2018-10-11 16:04:23 -07:00
QiuMM d559dfecb2 replace deprecated druid.port by druid.plaintextPort in docs (#6427) 2018-10-09 10:57:01 -07:00
Slim Bouguerra 028354eea8 Adding licenses and enable apache-rat-plugin. (#6215)
* Adding licenses and enable apache-rat-plugi.

Change-Id: I4685a2d9f1e147855dba69329b286f2d5bee3c18

* restore the copywrite of demo_table and add it to the list of allowed ones

Change-Id: I2a9efde6f4b984bc1ac90483e90d98e71f818a14

* revirew comments

Change-Id: I0256c930b7f9a5bb09b44b5e7a149e6ec48cb0ca

* more fixup

Change-Id: I1355e8a2549e76cd44487abec142be79bec59de2

* align

Change-Id: I70bc47ecb577bdf6b91639dd91b6f5642aa6b02f
2018-09-18 08:39:26 -07:00
Gian Merlino 431d3d8497
Rename io.druid to org.apache.druid. (#6266)
* Rename io.druid to org.apache.druid.

* Fix META-INF files and remove some benchmark results.

* MonitorsConfig update for metrics package migration.

* Reorder some dimensions in inner queries for some reason.

* Fix protobuf tests.
2018-08-30 09:56:26 -07:00
Jonathan Wei 24f2e8ba26 New quickstart and tutorials (#6126)
* New quickstart and tutorials

* PR comments

* Fix tranquility
2018-08-09 14:37:52 -06:00
Shimi Kiviti 0ebfb3fbb0 fixed missing gz extension from wikiticker json sample (#5194) 2017-12-28 01:02:12 +09:00
Kenji Noguchi 3400f601db Protobuf extension (#4039)
* move ProtoBufInputRowParser from processing module to protobuf extensions

* Ported PR #3509

* add DynamicMessage

* fix local test stuff that slipped in

* add license header

* removed redundant type name

* removed commented code

* fix code style

* rename ProtoBuf -> Protobuf

* pom.xml: shade protobuf classes, handle .desc resource file as binary file

* clean up error messages

* pick first message type from descriptor if not specified

* fix protoMessageType null check. add test case

* move protobuf-extension from contrib to core

* document: add new configuration keys, and descriptions

* update document. add examples

* move protobuf-extension from contrib to core (2nd try)

* touch

* include protobuf extensions in the distribution

* fix whitespace

* include protobuf example in the distribution

* example: create new pb obj everytime

* document: use properly quoted json

* fix whitespace

* bump parent version to 0.10.1-SNAPSHOT

* ignore Override check

* touch
2017-05-30 13:11:58 -07:00
Gian Merlino 50db86cb17 Quickstart: Use hadoopyString for batch indexing instead of string. (#3263) 2016-07-19 10:18:10 -07:00
fjy 1aa363cea7 new quickstart 2016-02-04 09:37:38 -08:00