Commit Graph

2131 Commits

Author SHA1 Message Date
Maytas Monsereenusorn 42359c93dd Implement ANY aggregator (#9187)
* Implement ANY aggregator

* Add copyright headers

* Add unit tests

* fix BufferAggregator

* Fix bug in BufferAggregator

* hook up the SQL command

* add check for buffer aggregator

* Address comment

* address comments

* add docs

* Address comments

* add more tests for numeric columns that have null values when run in sql compatible null mode

* fix checkstyle errors

* fix failing tests

* fix failing tests
2020-01-16 14:40:32 -08:00
Suneet Saldanha 92ac22d060 Link javaOpts to middlemanager runtime.properties docs (#9101)
* Link javaOpts to middlemanager runtime.properties docs

* fix broken link

* reword config links
2020-01-15 21:22:49 -08:00
Suneet Saldanha 85a3d416b0 Tutorials use new ingestion spec where possible (#9155)
* Tutorials use new ingestion spec where possible

There are 2 main changes
  * Use task type index_parallel instead of index
  * Remove the use of parser + firehose in favor of inputFormat + inputSource

index_parallel is the preferred method starting in 0.17. Setting the job to
index_parallel with the default maxNumConcurrentSubTasks(1) is the equivalent
of an index task

Instead of using a parserSpec, dimensionSpec and timestampSpec have been
promoted to the dataSchema. The format is described in the ioConfig as the
inputFormat.

There are a few cases where the new format is not supported
 * Hadoop must use firehoses instead of the inputSource and inputFormat
 * There is no equivalent of a combining firehose as an inputSource
 * A Combining firehose does not support index_parallel

* fix typo
2020-01-15 14:08:29 -08:00
Jonathan Wei d1500c1328 Update Kinesis resharding information about task failures (#9104) 2020-01-07 15:44:48 -08:00
Jonathan Wei 58d337186b
Graduation update for ASF release process guide and download links (#9126)
* Graduation update for ASF release process guide and download links

* Fix release vote thread typo

* Fix pom.xml
2020-01-06 15:00:33 -06:00
Jonathan Wei aa539177ec De-incubation cleanup in code, docs, packaging (#9108)
* De-incubation cleanup in code, docs, packaging

* remove unused docs script
2020-01-03 12:33:19 -05:00
Jihoon Son 3c31493772 Add missing docs for http client configurations (#9054)
* Add missing docs for http client configurations

* fix typo

* backticks
2019-12-19 17:41:04 -08:00
Chi Cao Minh 6178f05da6 Fail superbatch range partition multi dim values (#9058)
* Fail superbatch range partition multi dim values

Change the behavior of parallel indexing range partitioning to fail
ingestion if any row had multiple values for the partition dimension.
After this change, the behavior matches that of hadoop indexing.
(Previously, rows with multiple dimension values would be skipped.)

* Improve err msg, rename method, rename test class
2019-12-18 10:14:03 -08:00
Clint Wylie 6881535b48
docs - clarify cache parameters (#9020) 2019-12-13 16:53:45 -08:00
Suneet Saldanha 3325da1718 Allow startup scripts to specify java home (#9021)
* Allow startup scripts to specify java home

The startup scripts now look for java in 3 locations. The order is from
most related to druid to least, ie
    ${DRUID_JAVA_HOME}
    ${JAVA_HOME}
    ${PATH}

* Update fn names and clean up code

* final round of fixes

* fix spellcheck
2019-12-12 21:36:00 -08:00
Himanshu 9236dd9467
optionally enable Jetty ForwardedRequestCustomizer (#9010)
* optionally enable Jetty ForwardedRequestCustomizer

* fix doc build
2019-12-12 17:00:08 -08:00
Benjamin Hopp 13c33c1766 Update architecture.md (#9015) 2019-12-11 19:05:50 -08:00
Jihoon Son e5e1e9c4ee
Fix broken master (#9005)
* Multibinding for NodeRole

* Fix endpoints

* fix doc

* fix test
2019-12-11 15:56:36 -08:00
Parag Jain 24fe824055 add readiness endpoints to processes having initialization delays (#8841) 2019-12-10 17:26:13 -08:00
Chi Cao Minh 3de7ab8523 DataSketches jars in core (#9003)
Having DataSketches jars in core will allow potential improvements, for
example:
- Provide an alternative implementation of HLL:
  https://datasketches.github.io/docs/HLL/HllSketchVsDruidHyperLogLogCollector.html
- Range partitioning for native parallel batch indexing without having
  the user load extensions on the classpath

Dev mailing list discussion:
https://lists.apache.org/thread.html/301410d71ff799cf616bf17c4ebcf9999fc30829f5fa62909f403e6c%40%3Cdev.druid.apache.org%3E
2019-12-10 14:02:34 -08:00
Chi Cao Minh bab78fc80e Parallel indexing single dim partitions (#8925)
* Parallel indexing single dim partitions

Implements single dimension range partitioning for native parallel batch
indexing as described in #8769. This initial version requires the
druid-datasketches extension to be loaded.

The algorithm has 5 phases that are orchestrated by the supervisor in
`ParallelIndexSupervisorTask#runRangePartitionMultiPhaseParallel()`.
These phases and the main classes involved are described below:

1) In parallel, determine the distribution of dimension values for each
   input source split.

   `PartialDimensionDistributionTask` uses `StringSketch` to generate
   the approximate distribution of dimension values for each input
   source split. If the rows are ungrouped,
   `PartialDimensionDistributionTask.UngroupedRowDimensionValueFilter`
   uses a Bloom filter to skip rows that would be grouped. The final
   distribution is sent back to the supervisor via
   `DimensionDistributionReport`.

2) The range partitions are determined.

   In `ParallelIndexSupervisorTask#determineAllRangePartitions()`, the
   supervisor uses `StringSketchMerger` to merge the individual
   `StringSketch`es created in the preceding phase. The merged sketch is
   then used to create the range partitions.

3) In parallel, generate partial range-partitioned segments.

   `PartialRangeSegmentGenerateTask` uses the range partitions
   determined in the preceding phase and
   `RangePartitionCachingLocalSegmentAllocator` to generate
   `SingleDimensionShardSpec`s.  The partition information is sent back
   to the supervisor via `GeneratedGenericPartitionsReport`.

4) The partial range segments are grouped.

   In `ParallelIndexSupervisorTask#groupGenericPartitionLocationsPerPartition()`,
   the supervisor creates the `PartialGenericSegmentMergeIOConfig`s
   necessary for the next phase.

5) In parallel, merge partial range-partitioned segments.

   `PartialGenericSegmentMergeTask` uses `GenericPartitionLocation` to
   retrieve the partial range-partitioned segments generated earlier and
   then merges and publishes them.

* Fix dependencies & forbidden apis

* Fixes for integration test

* Address review comments

* Fix docs, strict compile, sketch check, rollup check

* Fix first shard spec, partition serde, single subtask

* Fix first partition check in test

* Misc rewording/refactoring to address code review

* Fix doc link

* Split batch index integration test

* Do not run parallel-batch-index twice

* Adjust last partition

* Split ITParallelIndexTest to reduce runtime

* Rename test class

* Allow null values in range partitions

* Indicate which phase failed

* Improve asserts in tests
2019-12-09 23:05:49 -08:00
Vadim Ogievetsky 0330744793 Docs: bold Java 8 requirement (#8996)
* bold Java 8 req

* add warning box
2019-12-09 20:23:07 -08:00
Roman Leventov 1c62987783
Add SelfDiscoveryResource; rename org.apache.druid.discovery.No… (#6702)
* Add SelfDiscoveryResource

* Rename org.apache.druid.discovery.NodeType to NodeRole. Refactor CuratorDruidNodeDiscoveryProvider. Make SelfDiscoveryResource to listen to updates only about a single node (itself).

* Extended docs

* Fix brace

* Remove redundant throws in Lifecycle.Handler.stop()

* Import order

* Remove unresolvable link

* Address comments

* tmp

* tmp

* Rollback docker changes

* Remove extra .sh files

* Move filter

* Fix SecurityResourceFilterTest
2019-12-08 18:47:58 +03:00
Clint Wylie 441515cb50 update dump-segment docs so example command works (#8998)
* update dump-segment docs so example command works

* not everyone uses bash
2019-12-07 06:36:46 -08:00
Jonathan Wei c949a25210
Add DruidInputSource (replacement for IngestSegmentFirehose) (#8982)
* Add Druid input source and format

* Inherit dims/metrics from segment

* Add ingest segment firehose reindexing test

* Remove unnecessary module

* Fix unit tests, checkstyle

* Add doc entry

* Fix dimensionExclusions handling, add parallel index integration test

* Add spelling exclusion

* Address some PR comments

* Checkstyle

* wip

* Address rest of PR comments

* Address PR comments
2019-12-05 16:50:00 -08:00
Clint Wylie 5ecdf94d83
add 'prefixes' support to google input source (#8930)
* add prefixes support to google input source, making it symmetrical-ish with s3

* docs

* more better, and tests

* unused

* formatting

* javadoc

* dependencies

* oops

* review comments

* better javadoc
2019-12-04 21:01:10 -08:00
Lucas Capistrant 8dd9a8cb15 Small doc fix for baseTaskDir conf (#8978) 2019-12-04 14:07:03 -08:00
Clint Wylie a48784a1fd dropwizard-emitter doc fixes (#8988) 2019-12-04 12:52:58 -08:00
Fangyuan Deng 187cf0dd3f [Improvement] historical fast restart by lazy load columns metadata(20X faster) (#6988)
* historical fast restart by lazy load columns metadata

* delete repeated code

* add documentation for druid.segmentCache.lazyLoadOnStart

* fix unit test fail

* fix spellcheck

* update docs

* update docs mentioning a catch
2019-12-03 09:47:01 -08:00
Jonathan Wei 00ce18a0ea
Additional Kinesis resharding fixes (#8870)
* Additional Kinesis resharding fixes

* Address PR comments

* Remove unused method

* Adjust SegmentTransactionalInsertAction null handling

* Check for unchanged metadata on empty publish

* Add logs for empty publish

* Fix javadoc

* Clear offset when invalid endOffsets are seen

* Fix LGTM alert

* Fix build

* Add resharding note to Kinesis docs

* Checkstyle

* Spelling

* Address PR comments

* Checkstyle
2019-11-28 12:59:01 -08:00
Clint Wylie 4458113375
S3 input source (#8903)
* add s3 input source for native batch ingestion

* add docs

* fixes

* checkstyle

* lazy splits

* fixes and hella tests

* fix it

* re-use better iterator

* use key

* javadoc and checkstyle

* exception

* oops

* refactor to use S3Coords instead of URI

* remove unused code, add retrying stream to handle s3 stream

* remove unused parameter

* update to latest master

* use list of objects instead of object

* serde test

* refactor and such

* now with the ability to compile

* fix signature and javadocs

* fix conflicts yet again, fix S3 uri stuffs

* more tests, enforce uri for bucket

* javadoc

* oops

* abstract class instead of interface

* null or empty

* better error
2019-11-25 22:31:19 -08:00
Jihoon Son a2e6de4b16 Fix the potential race between SplittableInputSource.getNumSplits() and SplittableInputSource.createSplits() in TaskMonitor (#8924)
* Fix the potential race SplittableInputSource.getNumSplits() and SplittableInputSource.createSplits() in TaskMonitor

* Fix docs and javadoc

* Add unit tests for large or small estimated num splits

* add override
2019-11-23 01:38:08 -08:00
Clint Wylie 7250010388 add parquet support to native batch (#8883)
* add parquet support to native batch

* cleanup

* implement toJson for sampler support

* better binaryAsString test

* docs

* i hate spellcheck

* refactor toMap conversion so can be shared through flattenerMaker, default impls should be good enough for orc+avro, fixup for merge with latest

* add comment, fix some stuff

* adjustments

* fix accident

* tweaks
2019-11-22 10:49:16 -08:00
SeKing 9955107e8e RandomLocationSelectorStrategy to Choose an available disk(location) to store a segment. With unit tests. (#8461) 2019-11-22 03:46:54 -08:00
Surekha d628bebbd7 Make supervisor API similar to submit task API (#8810)
* accept spec or dataSchema, tuningConfig, ioConfig while submitting task json

* fix test

* update docs

* lgtm warning

* Add original constructor back to IndexTask to minimize changes

* fix indentation in docs

* Allow spec to be specified in supervisor schema

* undo IndexTask spec changes

* update docs

* Add Nullable and deprecated annotations

* remove deprecated configs from SeekableStreamSupervisorSpec

* remove nullable annotation
2019-11-20 10:04:41 -08:00
Clint Wylie d67c3c7aed document SQL compatible null handling mode (#8894)
* document SQL compatible null handling mode

* adjustments

* fix docs

* review changes
2019-11-20 06:52:20 -08:00
Clint Wylie 074a45219d add google cloud storage InputSource for native batch (#8907)
* add google cloud storage InputSource for native batch

* rename

* checkstyle

* fix

* fix spelling

* review comments
2019-11-19 19:49:43 -08:00
Chi Cao Minh 8365bdf62a Address security vulnerabilities (#8878)
* Address security vulnerabilities

Security vulnerabilities addressed by upgrading 3rd party libs:

- Upgrade avro-ipc to 1.9.1
  - sonatype-2019-0115
- Upgrade caffeine to 2.8.0
  - sonatype-2019-0282
- Upgrade commons-beanutils to 1.9.4
  - CVE-2014-0114
- Upgrade commons-codec to 1.13
  - sonatype-2012-0050
- Upgrade commons-compress to 1.19
  - CVE-2019-12402
  - sonatype-2018-0293
- Upgrade hadoop-common to 2.8.5
  - CVE-2018-11767
- Upgrade hadoop-mapreduce-client-core to 2.8.5
  - CVE-2017-3166
- Upgrade hibernate-validator to 5.2.5
  - CVE-2017-7536
- Upgrade httpclient to 4.5.10
  - sonatype-2017-0359
- Upgrade icu4j to 55.1
  - CVE-2014-8147
- Upgrade jackson-databind to 2.6.7.3:
  - CVE-2017-7525
- Upgrade jetty-http to 9.4.12:
  - CVE-2017-7657
  - CVE-2017-7658
  - CVE-2017-7656
  - CVE-2018-12545
- Upgrade log4j-core to 2.8.2
  - CVE-2017-5645:
- Upgrade netty to 3.10.6
  - CVE-2015-2156
- Upgrade netty-common to 4.1.42
  - CVE-2019-9518
- Upgrade netty-codec-http to 4.1.42
  - CVE-2019-16869
- Upgrade nimbus-jose-jwt to 4.41.1
  - CVE-2017-12972
  - CVE-2017-12974
- Upgrade plexus-utils to 3.0.24
  - CVE-2017-1000487
  - sonatype-2015-0173
  - sonatype-2016-0398
- Upgrade postgresql to 42.2.8
  - CVE-2018-10936

Note that if users are using JDBC lookups with postgres, they may need
to update the JDBC jar used by the lookup extension.

* Fix license for postgresql
2019-11-19 09:14:33 -08:00
Chi Cao Minh d60978343a Improve missing JDBC driver error for lookups (#8872)
If the JDBC drivers are missing from the lookup extensions, throw an
exception that directs the user how to resolve the issue. This change is
a follow up to #8825.
2019-11-18 11:42:38 -08:00
Jihoon Son 1611792855
Add InputSource and InputFormat interfaces (#8823)
* Add InputSource and InputFormat interfaces

* revert orc dependency

* fix dimension exclusions and failing unit tests

* fix tests

* fix test

* fix test

* fix firehose and inputSource for parallel indexing task

* fix tc

* fix tc: remove unused method

* Formattable

* add needsFormat(); renamed to ObjectSource; pass metricsName for reader

* address comments

* fix closing resource

* fix checkstyle

* fix tests

* remove verify from csv

* Revert "remove verify from csv"

This reverts commit 1ea7758489.

* address comments

* fix import order and javadoc

* flatMap

* sampleLine

* Add IntermediateRowParsingReader

* Address comments

* move csv reader test

* remove test for verify

* adjust comments

* Fix InputEntityIteratingReader

* rename source -> entity

* address comments
2019-11-15 09:22:09 -08:00
Clint Wylie cc54b2a9df support for array expressions in TransformSpec with ExpressionTransform (#8744)
* transformSpec + array expressions

changes:
* added array expression support to transformSpec
* removed ParseSpec.verify since its only use afaict was preventing transform expr that did not replace their input from functioning
* hijacked index task test to test changes

* remove docs about being unsupported

* re-arrange test assert

* unused imports

* imports

* fix tests

* preserve types

* suppress warning, fixes, add test

* formatting

* cleanup

* better list to array type conversion and tests

* fix oops
2019-11-13 11:04:37 -08:00
fst0 80dbf44fca Add reference to druid.storage.type (#8857)
* Add reference to `druid.storage.type`

This should be in here. Without setting storage type to S3 globally it will obviously not be used, even if all other parameters are correct.

* Update s3.md

Add global storage parameter to knob table.

* Update s3.md
2019-11-13 10:03:41 -08:00
Lucas Capistrant a066cc5648 Fix groupMapping endpoint URIs in druid-basic-security doc (#8847) 2019-11-12 21:12:34 +05:30
Jonathan Wei 75ea0d592a Add more datasketches doubles sketch SQL functions (#8843)
* Add more datasketches doubles sketch SQL postaggs

* style and lgtm
2019-11-08 18:05:06 -08:00
Gian Merlino 0e8c3f74d0 SQL: EARLIEST, LATEST aggregators. (#8815)
* SQL: EARLIEST, LATEST aggregators.

I chose these names instead of FIRST, LAST because those are already
reserved functions in Calcite that mean something different. I think
these are also better names anyway.

* Finalify.

* SQL updates.

* Adjust aggregator calls.

* Validations, test updates.

* Review docs.
2019-11-08 16:29:25 -08:00
Clint Wylie 7aafcf8bca parallel broker merges on fork join pool (#8578)
* sketch of broker parallel merges done in small batches on fork join pool

* fix non-terminating sequences, auto compute parallelism

* adjust benches

* adjust benchmarks

* now hella more faster, fixed dumb

* fix

* remove comments

* log.info for debug

* javadoc

* safer block for sequence to yielder conversion

* refactor LifecycleForkJoinPool into LifecycleForkJoinPoolProvider which wraps a ForkJoinPool

* smooth yield rate adjustment, more logs to help tune

* cleanup, less logs

* error handling, bug fixes, on by default, more parallel, more tests

* remove unused var

* comments

* timeboundary mergeFn

* simplify, more javadoc

* formatting

* pushdown config

* use nanos consistently, move logs back to debug level, bit more javadoc

* static terminal result batch

* javadoc for nullability of createMergeFn

* cleanup

* oops

* fix race, add docs

* spelling, remove todo, add unhandled exception log

* cleanup, revert unintended change

* another unintended change

* review stuff

* add ParallelMergeCombiningSequenceBenchmark, fixes

* hyper-threading is the enemy

* fix initial start delay, lol

* parallelism computer now balances partition sizes to partition counts using sqrt of sequence count instead of sequence count by 2

* fix those important style issues with the benchmarks code

* lazy sequence creation for benchmarks

* more benchmark comments

* stable sequence generation time

* update defaults to use 100ms target time, 4096 batch size, 16384 initial yield, also update user docs

* add jmh thread based benchmarks, cleanup some stuff

* oops

* style

* add spread to jmh thread benchmark start range, more comments to benchmarks parameters and purpose

* retool benchmark to allow modeling more typical heterogenous heavy workloads

* spelling

* fix

* refactor benchmarks

* formatting

* docs

* add maxThreadStartDelay parameter to threaded benchmark

* why does catch need to be on its own line but else doesnt
2019-11-07 11:58:46 -08:00
Jad Naous ce3c0dae4d Add note on JDBC libs for lookups (#8825)
* Add note on JDBC libs for lookups

* Fix directory and additional "the"
2019-11-06 13:31:26 -08:00
Himanshu 5adc8212b4
add documentation for druid docker and k8s operator (#8802)
* add documentation for druid docker and k8s operator

* address review comment and add Kubernetes to spelling file
2019-11-06 12:56:21 -08:00
Tijo Thomas 27acdbd2b8 'hadoop fs' command is deprecated . The new approach is to use hdfs command . Replacing 'hadoop fs' command with 'hdfs dfs' (#8762) 2019-11-01 04:42:10 +05:30
Giuseppe Martino 9c171e2b1f Message rejection absolute date (#8656)
* Add option lateMessageRejectionStartDate

* Use option lateMessageRejectionStartDate

* Fix tests

* Add lateMessageRejectionStartDate to kafka indexing service

* Update tests kafka indexing service

* Fix tests for KafkaSupervisorTest

* Add lateMessageRejectionStartDate to KinesisSupervisorIOConfig

* Fix var name

* Update documentation

* Add check lateMessageRejectionStartDateTime and lateMessageRejectionPeriod, fails if both were specified.
2019-10-31 15:13:02 -07:00
Clint Wylie 3ff5e02237 remove select query (#8739)
* remove select query

* thanks teamcity

* oops

* oops

* add back a SelectQuery class that throws RuntimeExceptions linking to docs

* adjust text

* update docs per review

* deprecated
2019-10-30 19:29:56 -07:00
Gian Merlino 7605c23354 Remove Tranquility configs and certain doc references. (#8793)
Since it hasn't received updates or community interest in a while, it makes sense
to de-emphasize it in the distribution and most documentation (outside of simple
mentions of its existence).
2019-10-30 16:30:16 -07:00
Gian Merlino c922d2c3c9 Use bundled ZooKeeper in tutorials. (#8792) 2019-10-30 16:17:28 -07:00
Gian Merlino aa81253cf4 Fix typos. (#8767) 2019-10-28 12:47:01 -07:00
Gian Merlino b65d2ac648 Add HDFS firehose (#8754)
* Add HDFS firehose.

* Tests, support for lists of paths.

* Fixups.

* Update list of firehoses.

* Wildcards is a word.
2019-10-28 08:07:38 -07:00
Vadim Ogievetsky f9b94a5db1
Docs: remove self link (#8760)
This section links to itself in the description. I tried to follow that link and spit hot tea all over my monitor from laughter.
2019-10-27 22:33:22 -07:00
Clint Wylie 09f92818d4 update druid expression docs to indicate that array functions do not work at indexing time (#8734)
* update druid expression docs to indicate that array functions are not supported in transformSpec

* fix unrelated spelling check
2019-10-24 22:04:08 -07:00
Eyal Yurman 14e33428f0 Moving Average extention: Add Sum averagers (#8511)
* Add sum averagers.

* avoid casting double to long.
2019-10-24 16:37:24 -07:00
Vadim Ogievetsky cc3650ee3b fix doc headers (#8729) 2019-10-24 11:17:39 -07:00
Jihoon Son f5b9bf5525 Cluster-wide configuration for query vectorization (#8657)
* Cluster-wide configuration for query vectorization

* add doc

* fix build

* fix doc

* rename to QueryConfig and add javadoc

* fix checkstyle

* fix variable names
2019-10-23 21:44:28 +08:00
David Glasser b453fda251 docs: clarify native batch ingestion w/ overlapping segments (#8720)
I was confused by a paragraph in the docs that I myself wrote!
2019-10-22 21:01:56 -07:00
Jad Naous 2ab43aa688 Update tutorial-kerberos-hadoop.md (#8689)
* Update tutorial-kerberos-hadoop.md

Fix up what looks like a bad merge.

* Update tutorial-kerberos-hadoop.md

Fix spelling issues
2019-10-22 14:40:41 -07:00
Abhishek Radhakrishnan 42cfe679f1 Update query result timestamp to match query intervals. (#8717) 2019-10-22 14:39:47 -07:00
Surekha e919eccc4b Update docs to add metadataSegment configs (#8708)
* Add metadataSegment configs to docs

* rearrange in alphabetical order
2019-10-22 01:19:36 -07:00
Kamal Gurala 3ed5f9698a gcs prefix doc fix (#8699) 2019-10-21 08:29:54 -07:00
Surekha 98f59ddd7e Add `sys.supervisors` table to system tables (#8547)
* Add supervisors table to SystemSchema

* Add docs

* fix checkstyle

* fix test

* fix CI

* Add comments

* Fix javadoc teamcity error

* comments

* fix links in docs

* fix links

* rename fullStatus query param to system and remove it from docs
2019-10-18 15:16:42 -07:00
Jonathan Wei d88075237a
Add initial SQL support for non-expression sketch postaggs (#8487)
* Add initial SQL support for non-expression sketch postaggs

* Checkstyle, spotbugs

* checkstyle

* imports

* Update SQL docs

* Checkstyle

* Fix theta sketch operator docs

* PR comments

* Checkstyle fixes

* Add missing entries for HLL sketch module

* PR comments, add round param to HLL estimate operator, fix optional HLL param
2019-10-18 14:59:44 -07:00
Jihoon Son 30c15900be
Auto compaction based on parallel indexing (#8570)
* Auto compaction based on parallel indexing

* javadoc and doc

* typo

* update spell

* addressing comments

* address comments

* fix log

* fix build

* fix test

* increase default max input segment bytes per task

* fix test
2019-10-18 13:24:14 -07:00
Mingming Qiu 2c758ef5ff Support assign tasks to run on different categories of MiddleManagers (#7066)
* Support assign tasks to run on different tiers of MiddleManagers

* address comments

* address comments

* rename tier to category and docs

* doc

* fix doc

* fix spelling errors

* docs
2019-10-17 12:57:19 -07:00
Jad Naous d54d2e1627 Update segments.md (#8693)
Make bullet numbers clearer with parantheses, fix last reference to 2 being interpreted as a bullet point.
2019-10-17 11:55:23 -07:00
Jad Naous 9f4e11df32 Update tutorial-rollup.md (#8687)
At this point there hasn't yet been an explanation in the tutorial of what "segments" are
2019-10-16 20:08:09 -06:00
Jonathan Wei 89ce6384f5
More Kinesis resharding adjustments (#8671)
* More Kinesis resharding adjustments

* Fix TC inspection

* Fix comment'

* Adjust comment, small refactor

* Make repartition transition time configurable

* Add spellcheck exclusion

* Spelling fix
2019-10-15 23:19:17 -07:00
Jihoon Son 4046c86d62
Stateful auto compaction (#8573)
* Stateful auto compaction

* javaodc

* add removed test back

* fix test

* adding indexSpec to compactionState

* fix build

* add lastCompactionState

* address comments

* extract CompactionState

* fix doc

* fix build and test

* Add a task context to store compaction state; add javadoc

* fix it test
2019-10-15 22:57:42 -07:00
Mitch Lloyd 1a78a0c98a Add credentials for ECS (#8651)
* Add credentials for ECS

* Fix import order

* Update S3 authentication methods table

* Update .spelling for new documentation
2019-10-12 09:12:14 -07:00
Abhishek Radhakrishnan d87840d894 Minor updates to documentation. (#8665) 2019-10-12 09:11:03 -07:00
Jihoon Son 96d8523ecb Use hash of Segment IDs instead of a list of explicit segments in auto compaction (#8571)
* IOConfig for compaction task

* add javadoc, doc, unit test

* fix webconsole test

* add spelling

* address comments

* fix build and test

* address comments
2019-10-09 11:12:00 -07:00
Clint Wylie 8bda3afea4 fix spelling errors triggered by another doc PR (#8653) 2019-10-08 23:43:58 -07:00
Nishant Bangarwa 0853273091 Add tier based usage metrics for historical nodes to help with autoscaling (#8636)
* Add tier based usage metrics for historical nodes to help with druid historical autoscaling

Add tier based usage metrics for historical nodes to help druid cluster orchestration systems understand the historical node usage and requirements. Following metrics would be helpful -

tier/required/capacity- total capacity in bytes required in each tier. Dimensions - tier
tier/total/capacity - total capacity in bytes available in a given tier. Dimension - tier
tier/historical/count - no. of historical nodes available in each tier. Dimension - tier
tier/replication/factor - configured maximum replication factor in given tier. Dimension - tier

* fix unit test failures
2019-10-08 19:55:32 -07:00
Mohammad J. Khan 18758f5228 Support LDAP authentication/authorization (#6972)
* Support LDAP authentication/authorization

* fixed integration-tests

* fixed Travis CI build errors related to druid-security module

* fixed failing test

* fixed failing test header

* added comments, force build

* fixes for strict compilation spotbugs checks

* removed authenticator rolling credential update feature

* removed escalator rolling credential update feature

* fixed teamcity inspection deprecated API usage error

* fixed checkstyle execution error, removed unused import

* removed cached config as part of removing authenticator rolling credential update feature

* removed config bundle entity as part of removing authenticator rolling credential update feature

* refactored ldao configuration

* added support for SSLContext configuration and TLSCertificateChecker

* removed check to return authentication failure when user has no group assigned, will be checked and handled by the authorizer

* Separate out authorizer checks between metadata-backed store user and LDAP user/groups

* refactored BasicSecuritySSLSocketFactory usage to fix strict compilation spotbugs checks

* fixes build issue

* final review comments updates

* final review comments updates

* fixed LGTM and spellcheck alerts

* Fixed Avatica auth failure error message check

* Updated metadata credentials validator exception message string, replaced DB with metadata store
2019-10-08 17:08:27 -07:00
Clint Wylie 2f20799868 merge recommendations into basic-cluster-tuning, add additional info (#8649)
* merge recommendations into basic-cluster-tuning, add additional info

* stupid sidebar
2019-10-08 16:33:54 -07:00
Himanshu c078ed40fd
groupBy query: optional limit push down to segment scan (#8426)
* groupBy query: optional limit push down to segment scan

* make segment level limit push down configurable

* fix teamcity errors

* fix segment limit pushdown flag handling on query level config override

* use equals for comparator check

* fix sql and null handling

* fix unused imports

* handle null offset in NullableValueGroupByColumnSelectorStrategy for buffer comparator similar to RowBasedGrouperHelper.NullableRowBasedKeySerdeHelper
2019-10-08 15:35:07 -07:00
Lucas Capistrant d801ce2f29 Update rollup table to properly reflect 0.16.0 (#8638)
This table stated that `index_parallel` tasks were best-effort only. However, this changed with #8061 and this documentation update was simply missed.
2019-10-07 12:37:15 -07:00
Xavier Léauté 1d42551d95 Fix statsd types (#8628)
* fix segment underReplicated/unavailable counts to be gauges instead of counters

* fix jvm/gc/cpu to be a counter instead of timre

jvm/gc/cpu represents the total cpu time spent for multiple gc
invocations, not the time spent in each gc cycle.

the number needs to be divided by jvm/gc/count to get the average gc
time per cycle

* update docs

* fix spellcheck
2019-10-06 14:14:09 -07:00
Parag Jain f0d74b240d password provider for basic authentication of HttpEmitterConfig (#8618) 2019-10-02 15:59:17 -07:00
Nishant Bangarwa 8537fbeca7 Implementing dropwizard emitter for druid (#7363)
* Implementing dropwizard emitter for druid

making metric manager and alert emitters as optional

* Refactor and make things work

more improvements

improve docs

refactrings

* Fix teamcity inspections

* review comments

* more review comments

* add limit to max number of gauges

* update pom version

* fix pom

* review comments

* review comment

* review comments

* fix broken doc link

review comments

review comments

* review comments

* fix checkstyle

* more spell check fixes

* fix travis failures
2019-10-01 14:59:30 -07:00
pdeva db65068c42 add reference to indexer nodes (#8607) 2019-09-30 16:45:33 -06:00
Sashidhar Thallam 51a7235ebc Making optimal usage of multiple segment cache locations (#8038)
* #7641 - Changing segment distribution algorithm to distribute segments to multiple segment cache locations

* Fixing indentation

* WIP

* Adding interface for location strategy selection, least bytes used strategy impl, round-robin strategy impl, locationSelectorStrategy config with least bytes used strategy as the default strategy

* fixing code style

* Fixing test

* Adding a method visible only for testing, fixing tests

* 1. Changing the method contract to return an iterator of locations instead of a single best location. 2. Check style fixes

* fixing the conditional statement

* Added testSegmentDistributionUsingLeastBytesUsedStrategy, fixed testSegmentDistributionUsingRoundRobinStrategy

* to trigger CI build

* Add documentation for the selection strategy configuration

* to re trigger CI build

* updated docs as per review comments, made LeastBytesUsedStorageLocationSelectorStrategy.getLocations a synchronzied method, other minor fixes

* In checkLocationConfigForNull method, using getLocations() to check for null instead of directly referring to the locations variable so that tests overriding getLocations() method do not fail

* Implementing review comments. Added tests for StorageLocationSelectorStrategy

* Checkstyle fixes

* Adding java doc comments for StorageLocationSelectorStrategy interface

* checkstyle

* empty commit to retrigger build

* Empty commit

* Adding suppressions for words leastBytesUsed and roundRobin of ../docs/configuration/index.md file

* Impl review comments including updating docs as suggested

* Removing checkLocationConfigForNull(), @NotEmpty annotation serves the purpose

* Round robin iterator to keep track of the no. of iterations, impl review comments, added tests for round robin strategy

* Fixing the round robin iterator

* Removed numLocationsToTry, updated java docs

* changing property attribute value from tier to type

* Fixing assert messages
2019-09-28 00:17:44 -06:00
Himanshu 9f1f5e115c
doubleMean aggregator to be used at query time (#8459)
* doubleMean aggregator for computing mean

* make docs

* build fixes

* address review comment: handle null args
2019-09-26 08:04:33 -07:00
Nishant Bangarwa a75ddaad9e Add TrustedDomain Authenticator (#8248)
* Add TrustedDomain Authenticator

update javadoc

Add nullable annotations

Add cautionary note

fix travis failure

* add IP to spell checker
2019-09-25 11:25:03 -07:00
Rye f2a444321b Added live reports for Kafka and Native batch task (#8557)
* Added live reports for Kafka and Native batch task

* Removed unused local variables

* Added the missing unit test

* Refine unit test logic, add implementation for HttpRemoteTaskRunner

* checksytle fixes

* Update doc descriptions for updated API

* remove unnecessary files

* Fix spellcheck complaints

* More details for api descriptions
2019-09-23 21:08:36 -07:00
Vadim Ogievetsky 52f3f2c229 fix docs version interpolation (#8568) 2019-09-22 17:38:55 -07:00
Vadim Ogievetsky 94298f7809 Update Kafka loading docs to use the streaming data loader (#8544)
* fix redirects

* remove useless page

* fix Single server reference configurations formatting

* update batch data loading

* update Kafka docs

* fix typos and tests

* add more links

* fix spelling
2019-09-22 15:00:52 -07:00
Chi Cao Minh aeac0d4fd3 Adjust defaults for hashed partitioning (#8565)
* Adjust defaults for hashed partitioning

If neither the partition size nor the number of shards are specified,
default to partitions of 5,000,000 rows (similar to the behavior of
dynamic partitions). Previously, both could be null and cause incorrect
behavior.

Specifying both a partition size and a number of shards now results in
an error instead of ignoring the partition size in favor of using the
number of shards. This is a behavior change that makes it more apparent
to the user that only one of the two properties will be honored
(previously, a message was just logged when the specified partition size
was ignored).

* Fix test

* Handle -1 as null

* Add -1 as null tests for single dim partitioning

* Simplify logic to handle -1 as null

* Address review comments
2019-09-21 20:57:40 -07:00
Chi Cao Minh 99b6eedab5 Rename partition spec fields (#8507)
* Rename partition spec fields

Rename partition spec fields to be consistent across the various types
(hashed, single_dim, dynamic). Specifically, use targetNumRowsPerSegment
and maxRowsPerSegment in favor of targetPartitionSize and
maxSegmentSize. Consistent and clearer names are easier for users to
understand and use.

Also fix various IntelliJ inspection warnings and doc spelling mistakes.

* Fix test

* Improve docs

* Add targetRowsPerSegment to HashedPartitionsSpec
2019-09-20 14:59:18 -06:00
Xavier Léauté e184d24a74
add support for dogstatsd events in statsd-emitter (#8546)
* add support for dogstatsd events in statsd-emitter
* add option to turn on alert events (off by default)
* updated docs
2019-09-19 08:12:30 -07:00
Chi Cao Minh 7dcbaca658 Spellcheck docs (#8548)
* Spellcheck docs

Fix spelling mistakes in docs and add CI job for running spellcheck on
docs.

* Add missing license header
2019-09-17 12:47:30 -07:00
Vadim Ogievetsky 0490909ab3 Web console: Update web console docs for 0.16.0 (#8530)
* Update webconsole docs

* home view

* fix annotation typo
2019-09-13 09:09:36 -07:00
Clint Wylie 75978e5b98 move google ext docs from contrib to core (#8512)
* move google ext docs from contrib to core

* fix links

* revert unintended change

* more links, add note to example ext doc that it was removed, unlink from sidebar
2019-09-12 09:40:39 -07:00
Jonathan Wei 0145642d8b Move router/indexer config/API docs to main pages (#8510)
* Move router/indexer config/API docs to main pages

* Restore missing properties, fix typo

* Use sentence casing

* Fix broken link
2019-09-11 21:42:58 -07:00
Clint Wylie fb078eea1e
fix web-console build in src distribution, fix kafka doc minimum version (#8502) 2019-09-10 21:01:07 -07:00
Chi Cao Minh 14a8613d69 Exit JVM on curator unhandled errors (#8458)
* Exit JVM on curator unhandled errors

If an unhandled error occurs when curator is talking to ZooKeeper, exit
the JVM in addition to stopping the lifecycle to prevent the process
from being left in a zombie state. With this change,
BoundedExponentialBackoffRetryWithQuit is no longer needed as when
curator exceeds the configured retries, it triggers its unhandled error
listeners. A new "connectionTimeoutMs" CuratorConfig setting is added
mostly to facilitate testing curator unhandled errors, but it may be
useful for users as well.

* Address review comments
2019-09-06 16:43:59 -07:00
Clint Wylie fd58fbc8d3
fix statds dogstatsdServiceAsTag docs example to match behavior (#8477) 2019-09-05 19:05:25 -07:00
SeKing 6a6893b406 Fix operator mistake of expression OR (#8452)
* Add realization for updating version of derived segments in MaterializedView

* add unit test, and change code style for the sake of ease of understanding

* fix document's mistake of expression
2019-09-04 21:27:18 -07:00
Lucas Capistrant bfb02f09f8 Add druid.segmentCache.numBootstrapThreads back to the docs (#8462) 2019-09-04 20:27:17 -07:00
legendtkl 0be4a41c06 Website Doc: fix bash command (#8442)
* fix "gunzip -k" to "gunzip -c"
2019-08-30 22:22:09 -07:00
Clint Wylie 3baf31e9a8 add documentation for group by array based result format (#8416) 2019-08-28 08:30:31 -07:00
Jonathan Wei c626452b47 Add nano-quickstart single server example configuration (#8390)
* Add nano-quickstart single server example configuration

* Use two workers

* Shrink processing buffers
2019-08-24 22:07:20 -07:00
Furkan KAMACI 02fe3db911 Zookeeper version is updated. (#8363)
* Zookeeper version is updated.

* Zookeeper version is updated at licenses.yaml

* licenses.yaml is updated and dependencies are fixed to make the project successfully build.

* Zookeeper versions are fixed at licenses.yaml
2019-08-24 22:00:43 -07:00
Jihoon Son 95fa609615 Fix wrong partitionsSpec type names in the document (#8297)
* Fix wrong type names for partitionsSpec

* add unit tests; add json properties for backward compatibility

* beautify conf names

* remove maxRowsPerSegment from hashed partitionsSpec

* fix doc build
2019-08-23 13:44:58 -07:00
Clint Wylie 7749571a7f order and add more ports to hadoop docker container in hadoop indexing tutorial (#8329)
LGTM
2019-08-23 15:43:06 -05:00
Surekha cf2a2dd917
Add group_id to the sys.tasks table (#8304)
* Add group_id to overlord tasks API and sys.tasks table

* adjust test

* modify docs

* Make groupId nullable

* fix integration test

* fix toString

* Remove groupId from TaskInfo

* Modify docs and tests

* modify TaskMonitorTest
2019-08-22 15:28:23 -07:00
Clint Wylie 010f70b371
autogenerate NOTICE.BINARY from NOTICE and licenses.yaml (#8306)
* migrate binary notice entries to live in licenses.yaml, use licenses.yaml and NOTICE to generate NOTICE.BINARY at distribution time

* +x

* move release scripts to distribution/bin, fixup notice script, trim dependencies for avro and kerberos in licenses.yaml

* add missing hdfs-storage dependencies

* revert to old syntax, fixes

* formatting

* update notices for recently updated dependencies
2019-08-21 12:46:27 -07:00
Gian Merlino d007477742
Docusaurus build framework + ingestion doc refresh. (#8311)
* Docusaurus build framework + ingestion doc refresh.

* stick to npm instead of yarn

* fix typos

* restore some _bin

* Adjustments.

* detect and fix redirect anchors

* update anchor lint

* Web-console: remove specific column filters (#8343)

* add clear filter

* update tool kit

* remove usless check

* auto run

* add %

* Fix resource leak (#8337)

* Fix resource leak

* Patch comments

* Enable Spotbugs NP_NONNULL_RETURN_VIOLATION (#8234)

* Fixes from PR review.

* Fix more anchors.

* Preamble nix.

* Fix more anchors, headers

* clean up placeholder page

* add to website lint to travis config

* better broken link checking

* travis fix

* Fixed more broken links

* better redirects

* unfancy catch

* fix LGTM error

* link fixes

* fix md issues

* Addl fixes
2019-08-20 21:48:59 -07:00
Fokko Driesprong d5a19675dd Remove fromPigAvroStorage from the docs (#8340)
This one has been deprecated a while ago
2019-08-20 16:34:55 -07:00
Jonathan Wei dd2e53baf4
Clarify Avro decoder docs (#8302) 2019-08-19 15:37:18 -05:00
Jihoon Son 31af4eb9ad
Rename maxNumSubTasks to maxNumConcurrentSubTasks for native parallel index task (#8324) 2019-08-16 15:57:13 -07:00
Jihoon Son 5dac6375f3
Add support for parallel native indexing with shuffle for perfect rollup (#8257)
* Add TaskResourceCleaner; fix a couple of concurrency bugs in batch tasks

* kill runner when it's ready

* add comment

* kill run thread

* fix test

* Take closeable out of Appenderator

* add javadoc

* fix test

* fix test

* update javadoc

* add javadoc about killed task

* address comment

* Add support for parallel native indexing with shuffle for perfect rollup.

* Add comment about volatiles

* fix test

* fix test

* handling missing exceptions

* more clear javadoc for stopGracefully

* unused import

* update javadoc

* Add missing statement in javadoc

* address comments; fix doc

* add javadoc for isGuaranteedRollup

* Rename confusing variable name and fix typos

* fix typos; move fetch() to a better home; fix the expiration time

* add support https
2019-08-15 17:43:35 -07:00
Jihoon Son eeae5d9365 Add a warning about experimental segment locking (#8301)
* Add a warning about experimental segment locking

* fix typo
2019-08-15 16:07:59 -07:00
Jihoon Son a5c9c2950f Add missing maxBytesInMemory in tuningConfig for auto compaction (#8274)
* Add missing tuningConfigs for auto compaciton

* Add doc

* add test
2019-08-13 14:10:26 -05:00
Alexandre Yang 6b4d028b96 [statsd-emitter] Add config to send Druid process/service as tag (#8238)
* [statsd-emitter] Add serviceAsTag option

* [statsd-emitter] Refactor serviceAsTag option

* [statsd-emitter] Update statsd.md

* [statsd-emitter] add default prefix

* [statsd-emitter] update statsd.md

* [statsd-emitter] Remove extra spaces

* [statsd-emitter] Improve docs for config `dogstatsdServiceAsTag`

* [statsd-emitter] Simplify equals() for StatsDEmitterConfig.java

* [statsd-emitter] Add @Nullable for StatsDEmitterConfig.java
2019-08-12 13:18:44 -07:00
Nathan b28e252d9a Minor Spelling Error (#8277)
* Minor Spelling Error

* Update mySQL password in docs

/extensions-core/mysql update druid.metadata.storage.connector.password
2019-08-09 16:06:02 -05:00
Jonathan Wei e88bbe71c0 Adjust default globalIngestionHeapLimitBytes for indexer, add more docs (#8255) 2019-08-07 23:04:07 -07:00
Jonathan Wei 5e57492298 Add docs for CliIndexer as an experimental feature (#8245)
* Experimental CliIndexer docs

* PR comments
2019-08-06 15:57:17 -07:00
Lucas Capistrant e252abedc5 Enable toggling request logging on/off for different query types (#7562)
* Enable ability to toggle SegmentMetadata request logging on/off

* Move SegmentMetadata query log filter to FilteredRequestLogger

* Update documentation to reflect the segment metadata flag moving to the filtered request logger

* Modify patch to allow blacklist of query types to not log to request logger

* Address styling and naming requests following latest code review

* Fix indentation on multiple locations per Druid style rules
2019-08-06 15:47:30 +03:00
Samarth Jain 93cf9d4ad4 SQL support for t-digest based sketch aggregators (#8100)
* SQL support for t-digest based sketch aggregators

* Fix teamcity errors

* Add missing dependencies

* Remove unused dependency

* Address code review comments

* Add checks for compression param
2019-08-05 12:01:42 -07:00
Jihoon Son 1ee828ff49
Add a cluster-wide configuration to force timeChunk lock and add a doc for segment locking (#8173)
* Add a cluster-wide configuration to force timeChunk lock and add a doc for segment locking

* add more test

* javadoc for missingIntervalsInOverwriteMode

* Fix test

* Address comments

* avoid spotbugs
2019-08-02 20:30:05 -07:00
Chi Cao Minh 4bd3bad8ba Add IPv4 SQL functions (#8223)
* Add IPv4 SQL functions

New SQL functions for filtering IPv4 addresses:
- IPV4_MATCH: Check if IP address belongs to a subnet
- IPV4_PARSE: Convert string IP address to integer
- IPV4_STRINGIFY: Convert integer IP address to string

These are the SQL analogs of the druid expressions with the same name.
Filtering is more efficient when operating on IP addresses as integers
instead of strings.

* Refactor operator conversions into named constants
2019-08-01 21:29:58 -07:00
Clint Wylie 01c8c82982 correct kerberos doc extension load list (#8224) 2019-08-01 17:03:25 -07:00
Chi Cao Minh 7783b31846 Add IPv4 druid expressions (#8197)
* Add IPv4 druid expressions

New druid expressions for filtering IPv4 addresses:
- ipv4address_match: Check if IP address belongs to a subnet
- ipv4address_parse: Convert string IP address to long
- ipv4address_stringify: Convert long IP address to string

These expressions operate on IP addresses represented as either strings
or longs, so that they can be applied to dimensions with mixed
representation of IP addresses. The filtering is more efficient when
operating on IP addresses as longs. In other words, the intended use
case is:

1) Use ipv4address_parse to convert to long at ingestion time
2) Use ipv4address_match to filter (on longs) at query time
3) Use ipv4adress_stringify to convert to (readable) string at query
time

* Fix licenses and null handling

* Simplify IPv4 expressions

* Fix tests

* Fix check for valid ipv4 address string
2019-08-01 11:45:04 -07:00
Surekha f0ecdfee30 Fix `is_realtime` column behavior in sys.segments table (#8154)
* Fix is_realtime flag

* make variable final

* minor changes

* Modify is_realtime behavior based on review comment

* Fix UT
2019-07-31 22:26:49 -06:00
Nathan 716ce7fdc7 Spelling Error (#8206) 2019-07-31 10:43:11 -07:00
Jihoon Son 385f492a55
Use PartitionsSpec for all task types (#8141)
* Use partitionsSpec for all task types

* fix doc

* fix typos and revert to use isPushRequired

* address comments

* move partitionsSpec to core

* remove hadoopPartitionsSpec
2019-07-30 17:24:39 -07:00
Clint Wylie 653b558134 sql firehose and firehose doc adjustments (#8067)
* firehose doc adjustments

* fix typo

* additional information on parser types in ingestion docs

* clarify ingest segment firehose docs, add sql firehose examples to sql extension pages

* fixit

* make sql firehose more forgiving my always constructing a MapInputRowParser from the parseSpec of whatever actual InputRowParser impl is provided, remove doc references to map based parsers

* transforms

* fix tests
2019-07-30 15:28:10 -07:00
Jonathan Wei 640b7afc1c Add CliIndexer process type and initial task runner implementation (#8107)
* Add CliIndexer process type and initial task runner implementation

* Fix HttpRemoteTaskRunnerTest

* Remove batch sanity check on PeonAppenderatorsManager

* Fix paralle index tests

* PR comments

* Adjust Jersey resource logging

* Additional cleanup

* Fix SystemSchemaTest

* Add comment to LocalDataSegmentPusherTest absolute path test

* More PR comments

* Use Server annotated with RemoteChatHandler

* More PR comments

* Checkstyle

* PR comments

* Add task shutdown to stopGracefully

* Small cleanup

* Compile fix

* Address PR comments

* Adjust TaskReportFileWriter and fix nits

* Remove unnecessary closer

* More PR comments

* Minor adjustments

* PR comments

* ThreadingTaskRunner: cancel  task run future not shutdownFuture and remove thread from workitem
2019-07-29 17:06:33 -07:00
Jihoon Son 61f4abece4
Add more warning to the doc for resetOffsetAutomatically (#8153)
* Add more warnings to the doc for resetOffsetAutomatically

* fix kinesis doc

* fix typos

* revise the description

* capital

* capitalize
2019-07-24 17:37:32 -07:00
Magnus Henoch c87b47e0fa More documentation formatting fixes (#8149)
Add empty lines before bulleted lists and code blocks, to ensure that
they show up properly on the web site.  See also #8079.
2019-07-24 15:26:03 -07:00
Clint Wylie b8b22b7aaa fix references to bin/supervise in tutorial docs (#8087) 2019-07-23 15:05:01 -07:00
Clint Wylie 83514958db remove unnecessary lock in ForegroundCachePopulator leading to a lot of contention (#8116)
* remove unecessary lock in ForegroundCachePopulator leading to a lot of contention

* mutableboolean, javadocs,document some cache configs that were missing

* more doc stuff

* adjustments

* remove background documentation
2019-07-23 10:57:59 -07:00
Sashidhar Thallam ea4bad7836 Druid SQL EXTRACT time function - adding support for additional Time Units (#8068)
* 1. Added TimestampExtractExprMacro.Unit for MILLISECOND 2. expr eval for MILLISECOND 3. Added a test case to test extracting millisecond from expression. #7935

* 1. Adding DATASOURCE4 in tests. 2. Adding test TimeExtractWithMilliseconds

* Fixing testInformationSchemaTables test

* Fixing failing tests in DruidAvaticaHandlerTest

* Adding cannotVectorize() call before the test

* Extract time function - Adding support for MICROSECOND, ISODOW, ISOYEAR and CENTURY time units, documentation changes.

* Adding MILLISECOND in test case

* Adding support DECADE and MILLENNIUM, updating test case and documentation

* Fixing expression eval for DECADE and MILLENIUM
2019-07-19 20:38:32 -07:00
Roman Leventov ceb969903f
Refactor SQLMetadataSegmentManager; Change contract of REST met… (#7653)
* Refactor SQLMetadataSegmentManager; Change contract of REST methods in DataSourcesResource

* Style fixes

* Unused imports

* Fix tests

* Fix style

* Comments

* Comment fix

* Remove unresolvable Javadoc references; address comments

* Add comments to ImmutableDruidDataSource

* Merge with master

* Fix bad web-console merge

* Fixes in api-reference.md

* Rename in DruidCoordinatorRuntimeParams

* Fix compilation

* Residual changes
2019-07-17 17:18:48 +03:00
Magnus Henoch 179253a2fc Fix documentation formatting (#8079)
The Markdown dialect used when publishing the documentation to the web
site is much more sensitive than Github-flavoured Markdown.  In
particular, it requires an empty line before code blocks (unless the
code block starts right after a heading), otherwise the code block
gets formatted in-line with the previous paragraph.  Likewise for
bullet-point lists.
2019-07-15 09:55:18 -07:00
Gian Merlino ffa25b7832
Query vectorization. (#6794)
* Benchmarks: New SqlBenchmark, add caching & vectorization to some others.

- Introduce a new SqlBenchmark geared towards benchmarking a wide
  variety of SQL queries. Rename the old SqlBenchmark to
  SqlVsNativeBenchmark.
- Add (optional) caching to SegmentGenerator to enable easier
  benchmarking of larger segments.
- Add vectorization to FilteredAggregatorBenchmark and GroupByBenchmark.

* Query vectorization.

This patch includes vectorized timeseries and groupBy engines, as well
as some analogs of your favorite Druid classes:

- VectorCursor is like Cursor. (It comes from StorageAdapter.makeVectorCursor.)
- VectorColumnSelectorFactory is like ColumnSelectorFactory, and it has
  methods to create analogs of the column selectors you know and love.
- VectorOffset and ReadableVectorOffset are like Offset and ReadableOffset.
- VectorAggregator is like BufferAggregator.
- VectorValueMatcher is like ValueMatcher.

There are some noticeable differences between vectorized and regular
execution:

- Unlike regular cursors, vector cursors do not understand time
  granularity. They expect query engines to handle this on their own,
  which a new VectorCursorGranularizer class helps with. This is to
  avoid too much batch-splitting and to respect the fact that vector
  selectors are somewhat more heavyweight than regular selectors.
- Unlike FilteredOffset, FilteredVectorOffset does not leverage indexes
  for filters that might partially support them (like an OR of one
  filter that supports indexing and another that doesn't). I'm not sure
  that this behavior is desirable anyway (it is potentially too eager)
  but, at any rate, it'd be better to harmonize it between the two
  classes. Potentially they should both do some different thing that
  is smarter than what either of them is doing right now.
- When vector cursors are created by QueryableIndexCursorSequenceBuilder,
  they use a morphing binary-then-linear search to find their start and
  end rows, rather than linear search.

Limitations in this patch are:

- Only timeseries and groupBy have vectorized engines.
- GroupBy doesn't handle multi-value dimensions yet.
- Vector cursors cannot handle virtual columns or descending order.
- Only some filters have vectorized matchers: "selector", "bound", "in",
  "like", "regex", "search", "and", "or", and "not".
- Only some aggregators have vectorized implementations: "count",
  "doubleSum", "floatSum", "longSum", "hyperUnique", and "filtered".
- Dimension specs other than "default" don't work yet (no extraction
  functions or filtered dimension specs).

Currently, the testing strategy includes adding vectorization-enabled
tests to TimeseriesQueryRunnerTest, GroupByQueryRunnerTest,
GroupByTimeseriesQueryRunnerTest, CalciteQueryTest, and all of the
filtering tests that extend BaseFilterTest. In all of those classes,
there are some test cases that don't support vectorization. They are
marked by special function calls like "cannotVectorize" or "skipVectorize"
that tell the test harness to either expect an exception or to skip the
test case.

Testing should be expanded in the future -- a project in and of itself.

Related to #3011.

* WIP

* Adjustments for unused things.

* Adjust javadocs.

* DimensionDictionarySelector adjustments.

* Add "clone" to BatchIteratorAdapter.

* ValueMatcher javadocs.

* Fix benchmark.

* Fixups post-merge.

* Expect exception on testGroupByWithStringVirtualColumn for IncrementalIndex.

* BloomDimFilterSqlTest: Tag two non-vectorizable tests.

* Minor adjustments.

* Update surefire, bump up Xmx in Travis.

* Some more adjustments.

* Javadoc adjustments

* AggregatorAdapters adjustments.

* Additional comments.

* Remove switching search.

* Only missiles.
2019-07-12 12:54:07 -07:00
Chi Cao Minh da3d141dd2 Add inline firehose (#8056)
* Add inline firehose

To allow users to quickly parsing and schema, add a firehose that reads
data that is inlined in its spec.

* Address review comments

* Remove suppression of sonar warnings
2019-07-11 21:43:46 -07:00
Atul Mohan 631cda649b Include replicated segment size property for datasources endpoint (#8039)
* Add replication size

* Summon comma
2019-07-11 01:10:38 -07:00
Himanshu 14aec7fcec
add config to optionally disable all compression in intermediate segment persists while ingestion (#7919)
* disable all compression in intermediate segment persists while ingestion

* more changes and build fix

* by default retain existing indexingSpec for intermediate persisted segments

* document indexSpecForIntermediatePersists index tuning config

* fix build issues

* update serde tests
2019-07-10 12:22:24 -07:00
Jihoon Son 0a3538b569 Fix license check in travis and make it optional (#8049)
* Fix license check in travis and make it optional

* debug

* fix build

* too loud maven

* move MAVEN_OPTS to top and add comments

* adjust script

* remove mvn option from python script
2019-07-09 19:35:29 -07:00
Sashidhar Thallam 3353da2974 Adding missing docs for druid.indexer.logs.disableAcl (#8046) 2019-07-09 16:11:25 -07:00
Jihoon Son 12f12676e3
Binary license management system (#7998)
* Binary license management system

* add missing file

* add comment

* Address comments

* print missing licenses

* print druid module name

* Add missing licenses and update versions

* fix library versions and add missing ones. also fix pom.xml

* testing multi thread

* Parallel report generation

* fix build error

* install pyyaml and use old api

* install python3

* fix travis script

* python3.6

* pip

* setuptools

* python3-setuptools

* address comment

* error on not found reports or registered licenses

* removed licenses

* debug

* travis debug

* add missing licenses

* travis debug

* debug

* remove debug code

* test build script

* travis debug

* still debug

* add missing python lib

* debug

* debug

* fix travis

* fix travis

* debug travis

* flush print

* print something more to keep travis alive

* adjust print

* single threaded

* single threaded

* debug

* debug

* remove debug

* remove deprecated-2017Q4 from travis conf

* remove comments and duplicate sudo
2019-07-08 12:24:51 -07:00
Eyal Yurman 2eee711653 Add missing reference to Materialized-View extension. (#8003)
* Reference Materialized View extension from extensions page.

* Add comma
2019-07-06 13:50:41 -07:00
Dinesh Sawant 9c7c7c58ae Fix overlord port in delete data tutorial (#8037)
In Single-Server Quickstart tutorial the overlord and coordinator
is started as one process on port 8081. But in delete data tutorial the kill
task is sent to 8090 port, which fails.
2019-07-06 08:50:01 -07:00
Chi Cao Minh 0ded0ce414 Add round support for DS-HLL (#8023)
* Add round support for DS-HLL

Since the Cardinality aggregator has a "round" option to round off estimated
values generated from the HyperLogLog algorithm, add the same "round" option to
the DataSketches HLL Sketch module aggregators to be consistent.

* Fix checkstyle errors

* Change HllSketchSqlAggregator to do rounding

* Fix test for standard-compliant null handling mode
2019-07-05 15:37:58 -07:00
Clint Wylie 42a7b8849a remove FirehoseV2 and realtime node extensions (#8020)
* remove firehosev2 and realtime node extensions

* revert intellij stuff

* rat exclusion
2019-07-04 15:40:22 -07:00
Gian Merlino 613f09b45a SQL: Add TIME_CEIL function. (#8027)
Also simplify conversions for CEIL, FLOOR, and TIME_FLOOR by allowing them to
share more code.
2019-07-04 15:40:03 -07:00
Clint Wylie 3b84246cd6 add SQL docs for multi-value string dimensions (#8011)
* add SQL docs for multi-value string dimensions

* formatting consistency

* fix typo

* adjust
2019-07-03 08:22:33 -07:00
Clint Wylie c556d44a19
more sql support for expression array functions (#7974)
* more sql support for expression array functions

* prepend/slice

* doc fixes

* fix imports

* fix tests

* add null numeric expr for proper conversions between ExprEval and Expr and back to ExprEval

* re-arrange

* imports :(

* add append/prepend test
2019-07-02 21:39:26 -07:00