druid

Commit Graph

Author	SHA1	Message	Date
sthetland	83ddc8de1e	Update data-formats.md (#9238 ) * Update data-formats.md Field error and light rewording of new Avro material (and working through the doc authoring process). * Update data-formats.md Make default statements consistent. Future change: s/=/is.	2020-01-22 15:00:53 -08:00
Clint Wylie	8011211a0c	first/last aggregators and nulls (#9161 ) * null handling for numeric first/last aggregators, refactor to not extend nullable numeric agg since they are complex typed aggs * initially null or not based on config * review stuff, make string first/last consistent with null handling of numeric columns, more tests * docs * handle nil selectors, revert to primitive first/last types so groupby v1 works...	2020-01-20 11:51:54 -08:00
Suneet Saldanha	180c622e0f	Minor doc updates (#9217 ) * update string first last aggs * update kafka ingestion specs in docs * remove unnecessary parser spec	2020-01-20 11:34:37 -08:00
Gian Merlino	d21054f7c5	Remove the deprecated interval-chunking stuff. (#9216 ) * Remove the deprecated interval-chunking stuff. See https://github.com/apache/druid/pull/6591, https://github.com/apache/druid/pull/4004#issuecomment-284171911 for details. * Remove unused import. * Remove chunkInterval too.	2020-01-19 17:14:23 -08:00
Suneet Saldanha	93167188ea	Update docs for extensions (#9218 ) * Update docs for s3 and avro extensions * More doc updates - google + cleanup	2020-01-19 12:49:33 -08:00
Jihoon Son	153495068b	Doc update for the new input source and the new input format (#9171 ) * Doc update for new input source and input format. - The input source and input format are promoted in all docs under docs/ingestion - All input sources including core extension ones are located in docs/ingestion/native-batch.md - All input formats and parsers including core extension ones are localted in docs/ingestion/data-formats.md - New behavior of the parallel task with different partitionsSpecs are documented in docs/ingestion/native-batch.md * parquet * add warning for range partitioning with sequential mode * hdfs + s3, gs * add fs impl for gs * address comments * address comments * gcs	2020-01-17 15:52:05 -08:00
singh	936b9bdfd0	add deets about the keyfile (#9209 )	2020-01-17 11:24:49 -08:00
Maytas Monsereenusorn	42359c93dd	Implement ANY aggregator (#9187 ) * Implement ANY aggregator * Add copyright headers * Add unit tests * fix BufferAggregator * Fix bug in BufferAggregator * hook up the SQL command * add check for buffer aggregator * Address comment * address comments * add docs * Address comments * add more tests for numeric columns that have null values when run in sql compatible null mode * fix checkstyle errors * fix failing tests * fix failing tests	2020-01-16 14:40:32 -08:00
Suneet Saldanha	92ac22d060	Link javaOpts to middlemanager runtime.properties docs (#9101 ) * Link javaOpts to middlemanager runtime.properties docs * fix broken link * reword config links	2020-01-15 21:22:49 -08:00
Suneet Saldanha	85a3d416b0	Tutorials use new ingestion spec where possible (#9155 ) * Tutorials use new ingestion spec where possible There are 2 main changes * Use task type index_parallel instead of index * Remove the use of parser + firehose in favor of inputFormat + inputSource index_parallel is the preferred method starting in 0.17. Setting the job to index_parallel with the default maxNumConcurrentSubTasks(1) is the equivalent of an index task Instead of using a parserSpec, dimensionSpec and timestampSpec have been promoted to the dataSchema. The format is described in the ioConfig as the inputFormat. There are a few cases where the new format is not supported * Hadoop must use firehoses instead of the inputSource and inputFormat * There is no equivalent of a combining firehose as an inputSource * A Combining firehose does not support index_parallel * fix typo	2020-01-15 14:08:29 -08:00
Jonathan Wei	d1500c1328	Update Kinesis resharding information about task failures (#9104 )	2020-01-07 15:44:48 -08:00
Jonathan Wei	58d337186b	Graduation update for ASF release process guide and download links (#9126 ) * Graduation update for ASF release process guide and download links * Fix release vote thread typo * Fix pom.xml	2020-01-06 15:00:33 -06:00
Jonathan Wei	aa539177ec	De-incubation cleanup in code, docs, packaging (#9108 ) * De-incubation cleanup in code, docs, packaging * remove unused docs script	2020-01-03 12:33:19 -05:00
Jihoon Son	3c31493772	Add missing docs for http client configurations (#9054 ) * Add missing docs for http client configurations * fix typo * backticks	2019-12-19 17:41:04 -08:00
Chi Cao Minh	6178f05da6	Fail superbatch range partition multi dim values (#9058 ) * Fail superbatch range partition multi dim values Change the behavior of parallel indexing range partitioning to fail ingestion if any row had multiple values for the partition dimension. After this change, the behavior matches that of hadoop indexing. (Previously, rows with multiple dimension values would be skipped.) * Improve err msg, rename method, rename test class	2019-12-18 10:14:03 -08:00
Clint Wylie	6881535b48	docs - clarify cache parameters (#9020 )	2019-12-13 16:53:45 -08:00
Suneet Saldanha	3325da1718	Allow startup scripts to specify java home (#9021 ) * Allow startup scripts to specify java home The startup scripts now look for java in 3 locations. The order is from most related to druid to least, ie ${DRUID_JAVA_HOME} ${JAVA_HOME} ${PATH} * Update fn names and clean up code * final round of fixes * fix spellcheck	2019-12-12 21:36:00 -08:00
Himanshu	9236dd9467	optionally enable Jetty ForwardedRequestCustomizer (#9010 ) * optionally enable Jetty ForwardedRequestCustomizer * fix doc build	2019-12-12 17:00:08 -08:00
Benjamin Hopp	13c33c1766	Update architecture.md (#9015 )	2019-12-11 19:05:50 -08:00
Jihoon Son	e5e1e9c4ee	Fix broken master (#9005 ) * Multibinding for NodeRole * Fix endpoints * fix doc * fix test	2019-12-11 15:56:36 -08:00
Parag Jain	24fe824055	add readiness endpoints to processes having initialization delays (#8841 )	2019-12-10 17:26:13 -08:00
Chi Cao Minh	3de7ab8523	DataSketches jars in core (#9003 ) Having DataSketches jars in core will allow potential improvements, for example: - Provide an alternative implementation of HLL: https://datasketches.github.io/docs/HLL/HllSketchVsDruidHyperLogLogCollector.html - Range partitioning for native parallel batch indexing without having the user load extensions on the classpath Dev mailing list discussion: https://lists.apache.org/thread.html/301410d71ff799cf616bf17c4ebcf9999fc30829f5fa62909f403e6c%40%3Cdev.druid.apache.org%3E	2019-12-10 14:02:34 -08:00
Chi Cao Minh	bab78fc80e	Parallel indexing single dim partitions (#8925 ) * Parallel indexing single dim partitions Implements single dimension range partitioning for native parallel batch indexing as described in #8769. This initial version requires the druid-datasketches extension to be loaded. The algorithm has 5 phases that are orchestrated by the supervisor in `ParallelIndexSupervisorTask#runRangePartitionMultiPhaseParallel()`. These phases and the main classes involved are described below: 1) In parallel, determine the distribution of dimension values for each input source split. `PartialDimensionDistributionTask` uses `StringSketch` to generate the approximate distribution of dimension values for each input source split. If the rows are ungrouped, `PartialDimensionDistributionTask.UngroupedRowDimensionValueFilter` uses a Bloom filter to skip rows that would be grouped. The final distribution is sent back to the supervisor via `DimensionDistributionReport`. 2) The range partitions are determined. In `ParallelIndexSupervisorTask#determineAllRangePartitions()`, the supervisor uses `StringSketchMerger` to merge the individual `StringSketch`es created in the preceding phase. The merged sketch is then used to create the range partitions. 3) In parallel, generate partial range-partitioned segments. `PartialRangeSegmentGenerateTask` uses the range partitions determined in the preceding phase and `RangePartitionCachingLocalSegmentAllocator` to generate `SingleDimensionShardSpec`s. The partition information is sent back to the supervisor via `GeneratedGenericPartitionsReport`. 4) The partial range segments are grouped. In `ParallelIndexSupervisorTask#groupGenericPartitionLocationsPerPartition()`, the supervisor creates the `PartialGenericSegmentMergeIOConfig`s necessary for the next phase. 5) In parallel, merge partial range-partitioned segments. `PartialGenericSegmentMergeTask` uses `GenericPartitionLocation` to retrieve the partial range-partitioned segments generated earlier and then merges and publishes them. * Fix dependencies & forbidden apis * Fixes for integration test * Address review comments * Fix docs, strict compile, sketch check, rollup check * Fix first shard spec, partition serde, single subtask * Fix first partition check in test * Misc rewording/refactoring to address code review * Fix doc link * Split batch index integration test * Do not run parallel-batch-index twice * Adjust last partition * Split ITParallelIndexTest to reduce runtime * Rename test class * Allow null values in range partitions * Indicate which phase failed * Improve asserts in tests	2019-12-09 23:05:49 -08:00
Vadim Ogievetsky	0330744793	Docs: bold Java 8 requirement (#8996 ) * bold Java 8 req * add warning box	2019-12-09 20:23:07 -08:00
Roman Leventov	1c62987783	Add SelfDiscoveryResource; rename org.apache.druid.discovery.No… (#6702 ) * Add SelfDiscoveryResource * Rename org.apache.druid.discovery.NodeType to NodeRole. Refactor CuratorDruidNodeDiscoveryProvider. Make SelfDiscoveryResource to listen to updates only about a single node (itself). * Extended docs * Fix brace * Remove redundant throws in Lifecycle.Handler.stop() * Import order * Remove unresolvable link * Address comments * tmp * tmp * Rollback docker changes * Remove extra .sh files * Move filter * Fix SecurityResourceFilterTest	2019-12-08 18:47:58 +03:00
Clint Wylie	441515cb50	update dump-segment docs so example command works (#8998 ) * update dump-segment docs so example command works * not everyone uses bash	2019-12-07 06:36:46 -08:00
Jonathan Wei	c949a25210	Add DruidInputSource (replacement for IngestSegmentFirehose) (#8982 ) * Add Druid input source and format * Inherit dims/metrics from segment * Add ingest segment firehose reindexing test * Remove unnecessary module * Fix unit tests, checkstyle * Add doc entry * Fix dimensionExclusions handling, add parallel index integration test * Add spelling exclusion * Address some PR comments * Checkstyle * wip * Address rest of PR comments * Address PR comments	2019-12-05 16:50:00 -08:00
Clint Wylie	5ecdf94d83	add 'prefixes' support to google input source (#8930 ) * add prefixes support to google input source, making it symmetrical-ish with s3 * docs * more better, and tests * unused * formatting * javadoc * dependencies * oops * review comments * better javadoc	2019-12-04 21:01:10 -08:00
Lucas Capistrant	8dd9a8cb15	Small doc fix for baseTaskDir conf (#8978 )	2019-12-04 14:07:03 -08:00
Clint Wylie	a48784a1fd	dropwizard-emitter doc fixes (#8988 )	2019-12-04 12:52:58 -08:00
Fangyuan Deng	187cf0dd3f	[Improvement] historical fast restart by lazy load columns metadata(20X faster) (#6988 ) * historical fast restart by lazy load columns metadata * delete repeated code * add documentation for druid.segmentCache.lazyLoadOnStart * fix unit test fail * fix spellcheck * update docs * update docs mentioning a catch	2019-12-03 09:47:01 -08:00
Jonathan Wei	00ce18a0ea	Additional Kinesis resharding fixes (#8870 ) * Additional Kinesis resharding fixes * Address PR comments * Remove unused method * Adjust SegmentTransactionalInsertAction null handling * Check for unchanged metadata on empty publish * Add logs for empty publish * Fix javadoc * Clear offset when invalid endOffsets are seen * Fix LGTM alert * Fix build * Add resharding note to Kinesis docs * Checkstyle * Spelling * Address PR comments * Checkstyle	2019-11-28 12:59:01 -08:00
Clint Wylie	4458113375	S3 input source (#8903 ) * add s3 input source for native batch ingestion * add docs * fixes * checkstyle * lazy splits * fixes and hella tests * fix it * re-use better iterator * use key * javadoc and checkstyle * exception * oops * refactor to use S3Coords instead of URI * remove unused code, add retrying stream to handle s3 stream * remove unused parameter * update to latest master * use list of objects instead of object * serde test * refactor and such * now with the ability to compile * fix signature and javadocs * fix conflicts yet again, fix S3 uri stuffs * more tests, enforce uri for bucket * javadoc * oops * abstract class instead of interface * null or empty * better error	2019-11-25 22:31:19 -08:00
Jihoon Son	a2e6de4b16	Fix the potential race between SplittableInputSource.getNumSplits() and SplittableInputSource.createSplits() in TaskMonitor (#8924 ) * Fix the potential race SplittableInputSource.getNumSplits() and SplittableInputSource.createSplits() in TaskMonitor * Fix docs and javadoc * Add unit tests for large or small estimated num splits * add override	2019-11-23 01:38:08 -08:00
Clint Wylie	7250010388	add parquet support to native batch (#8883 ) * add parquet support to native batch * cleanup * implement toJson for sampler support * better binaryAsString test * docs * i hate spellcheck * refactor toMap conversion so can be shared through flattenerMaker, default impls should be good enough for orc+avro, fixup for merge with latest * add comment, fix some stuff * adjustments * fix accident * tweaks	2019-11-22 10:49:16 -08:00
SeKing	9955107e8e	RandomLocationSelectorStrategy to Choose an available disk(location) to store a segment. With unit tests. (#8461 )	2019-11-22 03:46:54 -08:00
Surekha	d628bebbd7	Make supervisor API similar to submit task API (#8810 ) * accept spec or dataSchema, tuningConfig, ioConfig while submitting task json * fix test * update docs * lgtm warning * Add original constructor back to IndexTask to minimize changes * fix indentation in docs * Allow spec to be specified in supervisor schema * undo IndexTask spec changes * update docs * Add Nullable and deprecated annotations * remove deprecated configs from SeekableStreamSupervisorSpec * remove nullable annotation	2019-11-20 10:04:41 -08:00
Clint Wylie	d67c3c7aed	document SQL compatible null handling mode (#8894 ) * document SQL compatible null handling mode * adjustments * fix docs * review changes	2019-11-20 06:52:20 -08:00
Clint Wylie	074a45219d	add google cloud storage InputSource for native batch (#8907 ) * add google cloud storage InputSource for native batch * rename * checkstyle * fix * fix spelling * review comments	2019-11-19 19:49:43 -08:00
Chi Cao Minh	8365bdf62a	Address security vulnerabilities (#8878 ) * Address security vulnerabilities Security vulnerabilities addressed by upgrading 3rd party libs: - Upgrade avro-ipc to 1.9.1 - sonatype-2019-0115 - Upgrade caffeine to 2.8.0 - sonatype-2019-0282 - Upgrade commons-beanutils to 1.9.4 - CVE-2014-0114 - Upgrade commons-codec to 1.13 - sonatype-2012-0050 - Upgrade commons-compress to 1.19 - CVE-2019-12402 - sonatype-2018-0293 - Upgrade hadoop-common to 2.8.5 - CVE-2018-11767 - Upgrade hadoop-mapreduce-client-core to 2.8.5 - CVE-2017-3166 - Upgrade hibernate-validator to 5.2.5 - CVE-2017-7536 - Upgrade httpclient to 4.5.10 - sonatype-2017-0359 - Upgrade icu4j to 55.1 - CVE-2014-8147 - Upgrade jackson-databind to 2.6.7.3: - CVE-2017-7525 - Upgrade jetty-http to 9.4.12: - CVE-2017-7657 - CVE-2017-7658 - CVE-2017-7656 - CVE-2018-12545 - Upgrade log4j-core to 2.8.2 - CVE-2017-5645: - Upgrade netty to 3.10.6 - CVE-2015-2156 - Upgrade netty-common to 4.1.42 - CVE-2019-9518 - Upgrade netty-codec-http to 4.1.42 - CVE-2019-16869 - Upgrade nimbus-jose-jwt to 4.41.1 - CVE-2017-12972 - CVE-2017-12974 - Upgrade plexus-utils to 3.0.24 - CVE-2017-1000487 - sonatype-2015-0173 - sonatype-2016-0398 - Upgrade postgresql to 42.2.8 - CVE-2018-10936 Note that if users are using JDBC lookups with postgres, they may need to update the JDBC jar used by the lookup extension. * Fix license for postgresql	2019-11-19 09:14:33 -08:00
Chi Cao Minh	d60978343a	Improve missing JDBC driver error for lookups (#8872 ) If the JDBC drivers are missing from the lookup extensions, throw an exception that directs the user how to resolve the issue. This change is a follow up to #8825.	2019-11-18 11:42:38 -08:00
Jihoon Son	1611792855	Add InputSource and InputFormat interfaces (#8823 ) * Add InputSource and InputFormat interfaces * revert orc dependency * fix dimension exclusions and failing unit tests * fix tests * fix test * fix test * fix firehose and inputSource for parallel indexing task * fix tc * fix tc: remove unused method * Formattable * add needsFormat(); renamed to ObjectSource; pass metricsName for reader * address comments * fix closing resource * fix checkstyle * fix tests * remove verify from csv * Revert "remove verify from csv" This reverts commit `1ea7758489`. * address comments * fix import order and javadoc * flatMap * sampleLine * Add IntermediateRowParsingReader * Address comments * move csv reader test * remove test for verify * adjust comments * Fix InputEntityIteratingReader * rename source -> entity * address comments	2019-11-15 09:22:09 -08:00
Clint Wylie	cc54b2a9df	support for array expressions in TransformSpec with ExpressionTransform (#8744 ) * transformSpec + array expressions changes: * added array expression support to transformSpec * removed ParseSpec.verify since its only use afaict was preventing transform expr that did not replace their input from functioning * hijacked index task test to test changes * remove docs about being unsupported * re-arrange test assert * unused imports * imports * fix tests * preserve types * suppress warning, fixes, add test * formatting * cleanup * better list to array type conversion and tests * fix oops	2019-11-13 11:04:37 -08:00
fst0	80dbf44fca	Add reference to druid.storage.type (#8857 ) * Add reference to `druid.storage.type` This should be in here. Without setting storage type to S3 globally it will obviously not be used, even if all other parameters are correct. * Update s3.md Add global storage parameter to knob table. * Update s3.md	2019-11-13 10:03:41 -08:00
Lucas Capistrant	a066cc5648	Fix groupMapping endpoint URIs in druid-basic-security doc (#8847 )	2019-11-12 21:12:34 +05:30
Jonathan Wei	75ea0d592a	Add more datasketches doubles sketch SQL functions (#8843 ) * Add more datasketches doubles sketch SQL postaggs * style and lgtm	2019-11-08 18:05:06 -08:00
Gian Merlino	0e8c3f74d0	SQL: EARLIEST, LATEST aggregators. (#8815 ) * SQL: EARLIEST, LATEST aggregators. I chose these names instead of FIRST, LAST because those are already reserved functions in Calcite that mean something different. I think these are also better names anyway. * Finalify. * SQL updates. * Adjust aggregator calls. * Validations, test updates. * Review docs.	2019-11-08 16:29:25 -08:00
Clint Wylie	7aafcf8bca	parallel broker merges on fork join pool (#8578 ) * sketch of broker parallel merges done in small batches on fork join pool * fix non-terminating sequences, auto compute parallelism * adjust benches * adjust benchmarks * now hella more faster, fixed dumb * fix * remove comments * log.info for debug * javadoc * safer block for sequence to yielder conversion * refactor LifecycleForkJoinPool into LifecycleForkJoinPoolProvider which wraps a ForkJoinPool * smooth yield rate adjustment, more logs to help tune * cleanup, less logs * error handling, bug fixes, on by default, more parallel, more tests * remove unused var * comments * timeboundary mergeFn * simplify, more javadoc * formatting * pushdown config * use nanos consistently, move logs back to debug level, bit more javadoc * static terminal result batch * javadoc for nullability of createMergeFn * cleanup * oops * fix race, add docs * spelling, remove todo, add unhandled exception log * cleanup, revert unintended change * another unintended change * review stuff * add ParallelMergeCombiningSequenceBenchmark, fixes * hyper-threading is the enemy * fix initial start delay, lol * parallelism computer now balances partition sizes to partition counts using sqrt of sequence count instead of sequence count by 2 * fix those important style issues with the benchmarks code * lazy sequence creation for benchmarks * more benchmark comments * stable sequence generation time * update defaults to use 100ms target time, 4096 batch size, 16384 initial yield, also update user docs * add jmh thread based benchmarks, cleanup some stuff * oops * style * add spread to jmh thread benchmark start range, more comments to benchmarks parameters and purpose * retool benchmark to allow modeling more typical heterogenous heavy workloads * spelling * fix * refactor benchmarks * formatting * docs * add maxThreadStartDelay parameter to threaded benchmark * why does catch need to be on its own line but else doesnt	2019-11-07 11:58:46 -08:00
Jad Naous	ce3c0dae4d	Add note on JDBC libs for lookups (#8825 ) * Add note on JDBC libs for lookups * Fix directory and additional "the"	2019-11-06 13:31:26 -08:00
Himanshu	5adc8212b4	add documentation for druid docker and k8s operator (#8802 ) * add documentation for druid docker and k8s operator * address review comment and add Kubernetes to spelling file	2019-11-06 12:56:21 -08:00
Tijo Thomas	27acdbd2b8	'hadoop fs' command is deprecated . The new approach is to use hdfs command . Replacing 'hadoop fs' command with 'hdfs dfs' (#8762 )	2019-11-01 04:42:10 +05:30
Giuseppe Martino	9c171e2b1f	Message rejection absolute date (#8656 ) * Add option lateMessageRejectionStartDate * Use option lateMessageRejectionStartDate * Fix tests * Add lateMessageRejectionStartDate to kafka indexing service * Update tests kafka indexing service * Fix tests for KafkaSupervisorTest * Add lateMessageRejectionStartDate to KinesisSupervisorIOConfig * Fix var name * Update documentation * Add check lateMessageRejectionStartDateTime and lateMessageRejectionPeriod, fails if both were specified.	2019-10-31 15:13:02 -07:00
Clint Wylie	3ff5e02237	remove select query (#8739 ) * remove select query * thanks teamcity * oops * oops * add back a SelectQuery class that throws RuntimeExceptions linking to docs * adjust text * update docs per review * deprecated	2019-10-30 19:29:56 -07:00
Gian Merlino	7605c23354	Remove Tranquility configs and certain doc references. (#8793 ) Since it hasn't received updates or community interest in a while, it makes sense to de-emphasize it in the distribution and most documentation (outside of simple mentions of its existence).	2019-10-30 16:30:16 -07:00
Gian Merlino	c922d2c3c9	Use bundled ZooKeeper in tutorials. (#8792 )	2019-10-30 16:17:28 -07:00
Gian Merlino	aa81253cf4	Fix typos. (#8767 )	2019-10-28 12:47:01 -07:00
Gian Merlino	b65d2ac648	Add HDFS firehose (#8754 ) * Add HDFS firehose. * Tests, support for lists of paths. * Fixups. * Update list of firehoses. * Wildcards is a word.	2019-10-28 08:07:38 -07:00
Vadim Ogievetsky	f9b94a5db1	Docs: remove self link (#8760 ) This section links to itself in the description. I tried to follow that link and spit hot tea all over my monitor from laughter.	2019-10-27 22:33:22 -07:00
Clint Wylie	09f92818d4	update druid expression docs to indicate that array functions do not work at indexing time (#8734 ) * update druid expression docs to indicate that array functions are not supported in transformSpec * fix unrelated spelling check	2019-10-24 22:04:08 -07:00
Eyal Yurman	14e33428f0	Moving Average extention: Add Sum averagers (#8511 ) * Add sum averagers. * avoid casting double to long.	2019-10-24 16:37:24 -07:00
Vadim Ogievetsky	cc3650ee3b	fix doc headers (#8729 )	2019-10-24 11:17:39 -07:00
Jihoon Son	f5b9bf5525	Cluster-wide configuration for query vectorization (#8657 ) * Cluster-wide configuration for query vectorization * add doc * fix build * fix doc * rename to QueryConfig and add javadoc * fix checkstyle * fix variable names	2019-10-23 21:44:28 +08:00
David Glasser	b453fda251	docs: clarify native batch ingestion w/ overlapping segments (#8720 ) I was confused by a paragraph in the docs that I myself wrote!	2019-10-22 21:01:56 -07:00
Jad Naous	2ab43aa688	Update tutorial-kerberos-hadoop.md (#8689 ) * Update tutorial-kerberos-hadoop.md Fix up what looks like a bad merge. * Update tutorial-kerberos-hadoop.md Fix spelling issues	2019-10-22 14:40:41 -07:00
Abhishek Radhakrishnan	42cfe679f1	Update query result timestamp to match query intervals. (#8717 )	2019-10-22 14:39:47 -07:00
Surekha	e919eccc4b	Update docs to add metadataSegment configs (#8708 ) * Add metadataSegment configs to docs * rearrange in alphabetical order	2019-10-22 01:19:36 -07:00
Kamal Gurala	3ed5f9698a	gcs prefix doc fix (#8699 )	2019-10-21 08:29:54 -07:00
Surekha	98f59ddd7e	Add `sys.supervisors` table to system tables (#8547 ) * Add supervisors table to SystemSchema * Add docs * fix checkstyle * fix test * fix CI * Add comments * Fix javadoc teamcity error * comments * fix links in docs * fix links * rename fullStatus query param to system and remove it from docs	2019-10-18 15:16:42 -07:00
Jonathan Wei	d88075237a	Add initial SQL support for non-expression sketch postaggs (#8487 ) * Add initial SQL support for non-expression sketch postaggs * Checkstyle, spotbugs * checkstyle * imports * Update SQL docs * Checkstyle * Fix theta sketch operator docs * PR comments * Checkstyle fixes * Add missing entries for HLL sketch module * PR comments, add round param to HLL estimate operator, fix optional HLL param	2019-10-18 14:59:44 -07:00
Jihoon Son	30c15900be	Auto compaction based on parallel indexing (#8570 ) * Auto compaction based on parallel indexing * javadoc and doc * typo * update spell * addressing comments * address comments * fix log * fix build * fix test * increase default max input segment bytes per task * fix test	2019-10-18 13:24:14 -07:00
Mingming Qiu	2c758ef5ff	Support assign tasks to run on different categories of MiddleManagers (#7066 ) * Support assign tasks to run on different tiers of MiddleManagers * address comments * address comments * rename tier to category and docs * doc * fix doc * fix spelling errors * docs	2019-10-17 12:57:19 -07:00
Jad Naous	d54d2e1627	Update segments.md (#8693 ) Make bullet numbers clearer with parantheses, fix last reference to 2 being interpreted as a bullet point.	2019-10-17 11:55:23 -07:00
Jad Naous	9f4e11df32	Update tutorial-rollup.md (#8687 ) At this point there hasn't yet been an explanation in the tutorial of what "segments" are	2019-10-16 20:08:09 -06:00
Jonathan Wei	89ce6384f5	More Kinesis resharding adjustments (#8671 ) * More Kinesis resharding adjustments * Fix TC inspection * Fix comment' * Adjust comment, small refactor * Make repartition transition time configurable * Add spellcheck exclusion * Spelling fix	2019-10-15 23:19:17 -07:00
Jihoon Son	4046c86d62	Stateful auto compaction (#8573 ) * Stateful auto compaction * javaodc * add removed test back * fix test * adding indexSpec to compactionState * fix build * add lastCompactionState * address comments * extract CompactionState * fix doc * fix build and test * Add a task context to store compaction state; add javadoc * fix it test	2019-10-15 22:57:42 -07:00
Mitch Lloyd	1a78a0c98a	Add credentials for ECS (#8651 ) * Add credentials for ECS * Fix import order * Update S3 authentication methods table * Update .spelling for new documentation	2019-10-12 09:12:14 -07:00
Abhishek Radhakrishnan	d87840d894	Minor updates to documentation. (#8665 )	2019-10-12 09:11:03 -07:00
Jihoon Son	96d8523ecb	Use hash of Segment IDs instead of a list of explicit segments in auto compaction (#8571 ) * IOConfig for compaction task * add javadoc, doc, unit test * fix webconsole test * add spelling * address comments * fix build and test * address comments	2019-10-09 11:12:00 -07:00
Clint Wylie	8bda3afea4	fix spelling errors triggered by another doc PR (#8653 )	2019-10-08 23:43:58 -07:00
Nishant Bangarwa	0853273091	Add tier based usage metrics for historical nodes to help with autoscaling (#8636 ) * Add tier based usage metrics for historical nodes to help with druid historical autoscaling Add tier based usage metrics for historical nodes to help druid cluster orchestration systems understand the historical node usage and requirements. Following metrics would be helpful - tier/required/capacity- total capacity in bytes required in each tier. Dimensions - tier tier/total/capacity - total capacity in bytes available in a given tier. Dimension - tier tier/historical/count - no. of historical nodes available in each tier. Dimension - tier tier/replication/factor - configured maximum replication factor in given tier. Dimension - tier * fix unit test failures	2019-10-08 19:55:32 -07:00
Mohammad J. Khan	18758f5228	Support LDAP authentication/authorization (#6972 ) * Support LDAP authentication/authorization * fixed integration-tests * fixed Travis CI build errors related to druid-security module * fixed failing test * fixed failing test header * added comments, force build * fixes for strict compilation spotbugs checks * removed authenticator rolling credential update feature * removed escalator rolling credential update feature * fixed teamcity inspection deprecated API usage error * fixed checkstyle execution error, removed unused import * removed cached config as part of removing authenticator rolling credential update feature * removed config bundle entity as part of removing authenticator rolling credential update feature * refactored ldao configuration * added support for SSLContext configuration and TLSCertificateChecker * removed check to return authentication failure when user has no group assigned, will be checked and handled by the authorizer * Separate out authorizer checks between metadata-backed store user and LDAP user/groups * refactored BasicSecuritySSLSocketFactory usage to fix strict compilation spotbugs checks * fixes build issue * final review comments updates * final review comments updates * fixed LGTM and spellcheck alerts * Fixed Avatica auth failure error message check * Updated metadata credentials validator exception message string, replaced DB with metadata store	2019-10-08 17:08:27 -07:00
Clint Wylie	2f20799868	merge recommendations into basic-cluster-tuning, add additional info (#8649 ) * merge recommendations into basic-cluster-tuning, add additional info * stupid sidebar	2019-10-08 16:33:54 -07:00
Himanshu	c078ed40fd	groupBy query: optional limit push down to segment scan (#8426 ) * groupBy query: optional limit push down to segment scan * make segment level limit push down configurable * fix teamcity errors * fix segment limit pushdown flag handling on query level config override * use equals for comparator check * fix sql and null handling * fix unused imports * handle null offset in NullableValueGroupByColumnSelectorStrategy for buffer comparator similar to RowBasedGrouperHelper.NullableRowBasedKeySerdeHelper	2019-10-08 15:35:07 -07:00
Lucas Capistrant	d801ce2f29	Update rollup table to properly reflect 0.16.0 (#8638 ) This table stated that `index_parallel` tasks were best-effort only. However, this changed with #8061 and this documentation update was simply missed.	2019-10-07 12:37:15 -07:00
Xavier Léauté	1d42551d95	Fix statsd types (#8628 ) * fix segment underReplicated/unavailable counts to be gauges instead of counters * fix jvm/gc/cpu to be a counter instead of timre jvm/gc/cpu represents the total cpu time spent for multiple gc invocations, not the time spent in each gc cycle. the number needs to be divided by jvm/gc/count to get the average gc time per cycle * update docs * fix spellcheck	2019-10-06 14:14:09 -07:00
Parag Jain	f0d74b240d	password provider for basic authentication of HttpEmitterConfig (#8618 )	2019-10-02 15:59:17 -07:00
Nishant Bangarwa	8537fbeca7	Implementing dropwizard emitter for druid (#7363 ) * Implementing dropwizard emitter for druid making metric manager and alert emitters as optional * Refactor and make things work more improvements improve docs refactrings * Fix teamcity inspections * review comments * more review comments * add limit to max number of gauges * update pom version * fix pom * review comments * review comment * review comments * fix broken doc link review comments review comments * review comments * fix checkstyle * more spell check fixes * fix travis failures	2019-10-01 14:59:30 -07:00
pdeva	db65068c42	add reference to indexer nodes (#8607 )	2019-09-30 16:45:33 -06:00
Sashidhar Thallam	51a7235ebc	Making optimal usage of multiple segment cache locations (#8038 ) * #7641 - Changing segment distribution algorithm to distribute segments to multiple segment cache locations * Fixing indentation * WIP * Adding interface for location strategy selection, least bytes used strategy impl, round-robin strategy impl, locationSelectorStrategy config with least bytes used strategy as the default strategy * fixing code style * Fixing test * Adding a method visible only for testing, fixing tests * 1. Changing the method contract to return an iterator of locations instead of a single best location. 2. Check style fixes * fixing the conditional statement * Added testSegmentDistributionUsingLeastBytesUsedStrategy, fixed testSegmentDistributionUsingRoundRobinStrategy * to trigger CI build * Add documentation for the selection strategy configuration * to re trigger CI build * updated docs as per review comments, made LeastBytesUsedStorageLocationSelectorStrategy.getLocations a synchronzied method, other minor fixes * In checkLocationConfigForNull method, using getLocations() to check for null instead of directly referring to the locations variable so that tests overriding getLocations() method do not fail * Implementing review comments. Added tests for StorageLocationSelectorStrategy * Checkstyle fixes * Adding java doc comments for StorageLocationSelectorStrategy interface * checkstyle * empty commit to retrigger build * Empty commit * Adding suppressions for words leastBytesUsed and roundRobin of ../docs/configuration/index.md file * Impl review comments including updating docs as suggested * Removing checkLocationConfigForNull(), @NotEmpty annotation serves the purpose * Round robin iterator to keep track of the no. of iterations, impl review comments, added tests for round robin strategy * Fixing the round robin iterator * Removed numLocationsToTry, updated java docs * changing property attribute value from tier to type * Fixing assert messages	2019-09-28 00:17:44 -06:00
Himanshu	9f1f5e115c	doubleMean aggregator to be used at query time (#8459 ) * doubleMean aggregator for computing mean * make docs * build fixes * address review comment: handle null args	2019-09-26 08:04:33 -07:00
Nishant Bangarwa	a75ddaad9e	Add TrustedDomain Authenticator (#8248 ) * Add TrustedDomain Authenticator update javadoc Add nullable annotations Add cautionary note fix travis failure * add IP to spell checker	2019-09-25 11:25:03 -07:00
Rye	f2a444321b	Added live reports for Kafka and Native batch task (#8557 ) * Added live reports for Kafka and Native batch task * Removed unused local variables * Added the missing unit test * Refine unit test logic, add implementation for HttpRemoteTaskRunner * checksytle fixes * Update doc descriptions for updated API * remove unnecessary files * Fix spellcheck complaints * More details for api descriptions	2019-09-23 21:08:36 -07:00
Vadim Ogievetsky	52f3f2c229	fix docs version interpolation (#8568 )	2019-09-22 17:38:55 -07:00
Vadim Ogievetsky	94298f7809	Update Kafka loading docs to use the streaming data loader (#8544 ) * fix redirects * remove useless page * fix Single server reference configurations formatting * update batch data loading * update Kafka docs * fix typos and tests * add more links * fix spelling	2019-09-22 15:00:52 -07:00
Chi Cao Minh	aeac0d4fd3	Adjust defaults for hashed partitioning (#8565 ) * Adjust defaults for hashed partitioning If neither the partition size nor the number of shards are specified, default to partitions of 5,000,000 rows (similar to the behavior of dynamic partitions). Previously, both could be null and cause incorrect behavior. Specifying both a partition size and a number of shards now results in an error instead of ignoring the partition size in favor of using the number of shards. This is a behavior change that makes it more apparent to the user that only one of the two properties will be honored (previously, a message was just logged when the specified partition size was ignored). * Fix test * Handle -1 as null * Add -1 as null tests for single dim partitioning * Simplify logic to handle -1 as null * Address review comments	2019-09-21 20:57:40 -07:00
Chi Cao Minh	99b6eedab5	Rename partition spec fields (#8507 ) * Rename partition spec fields Rename partition spec fields to be consistent across the various types (hashed, single_dim, dynamic). Specifically, use targetNumRowsPerSegment and maxRowsPerSegment in favor of targetPartitionSize and maxSegmentSize. Consistent and clearer names are easier for users to understand and use. Also fix various IntelliJ inspection warnings and doc spelling mistakes. * Fix test * Improve docs * Add targetRowsPerSegment to HashedPartitionsSpec	2019-09-20 14:59:18 -06:00
Xavier Léauté	e184d24a74	add support for dogstatsd events in statsd-emitter (#8546 ) * add support for dogstatsd events in statsd-emitter * add option to turn on alert events (off by default) * updated docs	2019-09-19 08:12:30 -07:00
Chi Cao Minh	7dcbaca658	Spellcheck docs (#8548 ) * Spellcheck docs Fix spelling mistakes in docs and add CI job for running spellcheck on docs. * Add missing license header	2019-09-17 12:47:30 -07:00
Vadim Ogievetsky	0490909ab3	Web console: Update web console docs for 0.16.0 (#8530 ) * Update webconsole docs * home view * fix annotation typo	2019-09-13 09:09:36 -07:00
Clint Wylie	75978e5b98	move google ext docs from contrib to core (#8512 ) * move google ext docs from contrib to core * fix links * revert unintended change * more links, add note to example ext doc that it was removed, unlink from sidebar	2019-09-12 09:40:39 -07:00
Jonathan Wei	0145642d8b	Move router/indexer config/API docs to main pages (#8510 ) * Move router/indexer config/API docs to main pages * Restore missing properties, fix typo * Use sentence casing * Fix broken link	2019-09-11 21:42:58 -07:00
Clint Wylie	fb078eea1e	fix web-console build in src distribution, fix kafka doc minimum version (#8502 )	2019-09-10 21:01:07 -07:00
Chi Cao Minh	14a8613d69	Exit JVM on curator unhandled errors (#8458 ) * Exit JVM on curator unhandled errors If an unhandled error occurs when curator is talking to ZooKeeper, exit the JVM in addition to stopping the lifecycle to prevent the process from being left in a zombie state. With this change, BoundedExponentialBackoffRetryWithQuit is no longer needed as when curator exceeds the configured retries, it triggers its unhandled error listeners. A new "connectionTimeoutMs" CuratorConfig setting is added mostly to facilitate testing curator unhandled errors, but it may be useful for users as well. * Address review comments	2019-09-06 16:43:59 -07:00
Clint Wylie	fd58fbc8d3	fix statds dogstatsdServiceAsTag docs example to match behavior (#8477 )	2019-09-05 19:05:25 -07:00
SeKing	6a6893b406	Fix operator mistake of expression OR (#8452 ) * Add realization for updating version of derived segments in MaterializedView * add unit test, and change code style for the sake of ease of understanding * fix document's mistake of expression	2019-09-04 21:27:18 -07:00
Lucas Capistrant	bfb02f09f8	Add druid.segmentCache.numBootstrapThreads back to the docs (#8462 )	2019-09-04 20:27:17 -07:00
legendtkl	0be4a41c06	Website Doc: fix bash command (#8442 ) * fix "gunzip -k" to "gunzip -c"	2019-08-30 22:22:09 -07:00
Clint Wylie	3baf31e9a8	add documentation for group by array based result format (#8416 )	2019-08-28 08:30:31 -07:00
Jonathan Wei	c626452b47	Add nano-quickstart single server example configuration (#8390 ) * Add nano-quickstart single server example configuration * Use two workers * Shrink processing buffers	2019-08-24 22:07:20 -07:00
Furkan KAMACI	02fe3db911	Zookeeper version is updated. (#8363 ) * Zookeeper version is updated. * Zookeeper version is updated at licenses.yaml * licenses.yaml is updated and dependencies are fixed to make the project successfully build. * Zookeeper versions are fixed at licenses.yaml	2019-08-24 22:00:43 -07:00
Jihoon Son	95fa609615	Fix wrong partitionsSpec type names in the document (#8297 ) * Fix wrong type names for partitionsSpec * add unit tests; add json properties for backward compatibility * beautify conf names * remove maxRowsPerSegment from hashed partitionsSpec * fix doc build	2019-08-23 13:44:58 -07:00
Clint Wylie	7749571a7f	order and add more ports to hadoop docker container in hadoop indexing tutorial (#8329 ) LGTM	2019-08-23 15:43:06 -05:00
Surekha	cf2a2dd917	Add group_id to the sys.tasks table (#8304 ) * Add group_id to overlord tasks API and sys.tasks table * adjust test * modify docs * Make groupId nullable * fix integration test * fix toString * Remove groupId from TaskInfo * Modify docs and tests * modify TaskMonitorTest	2019-08-22 15:28:23 -07:00
Clint Wylie	010f70b371	autogenerate NOTICE.BINARY from NOTICE and licenses.yaml (#8306 ) * migrate binary notice entries to live in licenses.yaml, use licenses.yaml and NOTICE to generate NOTICE.BINARY at distribution time * +x * move release scripts to distribution/bin, fixup notice script, trim dependencies for avro and kerberos in licenses.yaml * add missing hdfs-storage dependencies * revert to old syntax, fixes * formatting * update notices for recently updated dependencies	2019-08-21 12:46:27 -07:00
Gian Merlino	d007477742	Docusaurus build framework + ingestion doc refresh. (#8311 ) * Docusaurus build framework + ingestion doc refresh. * stick to npm instead of yarn * fix typos * restore some _bin * Adjustments. * detect and fix redirect anchors * update anchor lint * Web-console: remove specific column filters (#8343) * add clear filter * update tool kit * remove usless check * auto run * add % * Fix resource leak (#8337) * Fix resource leak * Patch comments * Enable Spotbugs NP_NONNULL_RETURN_VIOLATION (#8234) * Fixes from PR review. * Fix more anchors. * Preamble nix. * Fix more anchors, headers * clean up placeholder page * add to website lint to travis config * better broken link checking * travis fix * Fixed more broken links * better redirects * unfancy catch * fix LGTM error * link fixes * fix md issues * Addl fixes	2019-08-20 21:48:59 -07:00
Fokko Driesprong	d5a19675dd	Remove fromPigAvroStorage from the docs (#8340 ) This one has been deprecated a while ago	2019-08-20 16:34:55 -07:00
Jonathan Wei	dd2e53baf4	Clarify Avro decoder docs (#8302 )	2019-08-19 15:37:18 -05:00
Jihoon Son	31af4eb9ad	Rename maxNumSubTasks to maxNumConcurrentSubTasks for native parallel index task (#8324 )	2019-08-16 15:57:13 -07:00
Jihoon Son	5dac6375f3	Add support for parallel native indexing with shuffle for perfect rollup (#8257 ) * Add TaskResourceCleaner; fix a couple of concurrency bugs in batch tasks * kill runner when it's ready * add comment * kill run thread * fix test * Take closeable out of Appenderator * add javadoc * fix test * fix test * update javadoc * add javadoc about killed task * address comment * Add support for parallel native indexing with shuffle for perfect rollup. * Add comment about volatiles * fix test * fix test * handling missing exceptions * more clear javadoc for stopGracefully * unused import * update javadoc * Add missing statement in javadoc * address comments; fix doc * add javadoc for isGuaranteedRollup * Rename confusing variable name and fix typos * fix typos; move fetch() to a better home; fix the expiration time * add support https	2019-08-15 17:43:35 -07:00
Jihoon Son	eeae5d9365	Add a warning about experimental segment locking (#8301 ) * Add a warning about experimental segment locking * fix typo	2019-08-15 16:07:59 -07:00
Jihoon Son	a5c9c2950f	Add missing maxBytesInMemory in tuningConfig for auto compaction (#8274 ) * Add missing tuningConfigs for auto compaciton * Add doc * add test	2019-08-13 14:10:26 -05:00
Alexandre Yang	6b4d028b96	[statsd-emitter] Add config to send Druid process/service as tag (#8238 ) * [statsd-emitter] Add serviceAsTag option * [statsd-emitter] Refactor serviceAsTag option * [statsd-emitter] Update statsd.md * [statsd-emitter] add default prefix * [statsd-emitter] update statsd.md * [statsd-emitter] Remove extra spaces * [statsd-emitter] Improve docs for config `dogstatsdServiceAsTag` * [statsd-emitter] Simplify equals() for StatsDEmitterConfig.java * [statsd-emitter] Add @Nullable for StatsDEmitterConfig.java	2019-08-12 13:18:44 -07:00
Nathan	b28e252d9a	Minor Spelling Error (#8277 ) * Minor Spelling Error * Update mySQL password in docs /extensions-core/mysql update druid.metadata.storage.connector.password	2019-08-09 16:06:02 -05:00
Jonathan Wei	e88bbe71c0	Adjust default globalIngestionHeapLimitBytes for indexer, add more docs (#8255 )	2019-08-07 23:04:07 -07:00
Jonathan Wei	5e57492298	Add docs for CliIndexer as an experimental feature (#8245 ) * Experimental CliIndexer docs * PR comments	2019-08-06 15:57:17 -07:00
Lucas Capistrant	e252abedc5	Enable toggling request logging on/off for different query types (#7562 ) * Enable ability to toggle SegmentMetadata request logging on/off * Move SegmentMetadata query log filter to FilteredRequestLogger * Update documentation to reflect the segment metadata flag moving to the filtered request logger * Modify patch to allow blacklist of query types to not log to request logger * Address styling and naming requests following latest code review * Fix indentation on multiple locations per Druid style rules	2019-08-06 15:47:30 +03:00
Samarth Jain	93cf9d4ad4	SQL support for t-digest based sketch aggregators (#8100 ) * SQL support for t-digest based sketch aggregators * Fix teamcity errors * Add missing dependencies * Remove unused dependency * Address code review comments * Add checks for compression param	2019-08-05 12:01:42 -07:00
Jihoon Son	1ee828ff49	Add a cluster-wide configuration to force timeChunk lock and add a doc for segment locking (#8173 ) * Add a cluster-wide configuration to force timeChunk lock and add a doc for segment locking * add more test * javadoc for missingIntervalsInOverwriteMode * Fix test * Address comments * avoid spotbugs	2019-08-02 20:30:05 -07:00
Chi Cao Minh	4bd3bad8ba	Add IPv4 SQL functions (#8223 ) * Add IPv4 SQL functions New SQL functions for filtering IPv4 addresses: - IPV4_MATCH: Check if IP address belongs to a subnet - IPV4_PARSE: Convert string IP address to integer - IPV4_STRINGIFY: Convert integer IP address to string These are the SQL analogs of the druid expressions with the same name. Filtering is more efficient when operating on IP addresses as integers instead of strings. * Refactor operator conversions into named constants	2019-08-01 21:29:58 -07:00
Clint Wylie	01c8c82982	correct kerberos doc extension load list (#8224 )	2019-08-01 17:03:25 -07:00
Chi Cao Minh	7783b31846	Add IPv4 druid expressions (#8197 ) * Add IPv4 druid expressions New druid expressions for filtering IPv4 addresses: - ipv4address_match: Check if IP address belongs to a subnet - ipv4address_parse: Convert string IP address to long - ipv4address_stringify: Convert long IP address to string These expressions operate on IP addresses represented as either strings or longs, so that they can be applied to dimensions with mixed representation of IP addresses. The filtering is more efficient when operating on IP addresses as longs. In other words, the intended use case is: 1) Use ipv4address_parse to convert to long at ingestion time 2) Use ipv4address_match to filter (on longs) at query time 3) Use ipv4adress_stringify to convert to (readable) string at query time * Fix licenses and null handling * Simplify IPv4 expressions * Fix tests * Fix check for valid ipv4 address string	2019-08-01 11:45:04 -07:00
Surekha	f0ecdfee30	Fix `is_realtime` column behavior in sys.segments table (#8154 ) * Fix is_realtime flag * make variable final * minor changes * Modify is_realtime behavior based on review comment * Fix UT	2019-07-31 22:26:49 -06:00
Nathan	716ce7fdc7	Spelling Error (#8206 )	2019-07-31 10:43:11 -07:00
Jihoon Son	385f492a55	Use PartitionsSpec for all task types (#8141 ) * Use partitionsSpec for all task types * fix doc * fix typos and revert to use isPushRequired * address comments * move partitionsSpec to core * remove hadoopPartitionsSpec	2019-07-30 17:24:39 -07:00
Clint Wylie	653b558134	sql firehose and firehose doc adjustments (#8067 ) * firehose doc adjustments * fix typo * additional information on parser types in ingestion docs * clarify ingest segment firehose docs, add sql firehose examples to sql extension pages * fixit * make sql firehose more forgiving my always constructing a MapInputRowParser from the parseSpec of whatever actual InputRowParser impl is provided, remove doc references to map based parsers * transforms * fix tests	2019-07-30 15:28:10 -07:00
Jonathan Wei	640b7afc1c	Add CliIndexer process type and initial task runner implementation (#8107 ) * Add CliIndexer process type and initial task runner implementation * Fix HttpRemoteTaskRunnerTest * Remove batch sanity check on PeonAppenderatorsManager * Fix paralle index tests * PR comments * Adjust Jersey resource logging * Additional cleanup * Fix SystemSchemaTest * Add comment to LocalDataSegmentPusherTest absolute path test * More PR comments * Use Server annotated with RemoteChatHandler * More PR comments * Checkstyle * PR comments * Add task shutdown to stopGracefully * Small cleanup * Compile fix * Address PR comments * Adjust TaskReportFileWriter and fix nits * Remove unnecessary closer * More PR comments * Minor adjustments * PR comments * ThreadingTaskRunner: cancel task run future not shutdownFuture and remove thread from workitem	2019-07-29 17:06:33 -07:00
Jihoon Son	61f4abece4	Add more warning to the doc for resetOffsetAutomatically (#8153 ) * Add more warnings to the doc for resetOffsetAutomatically * fix kinesis doc * fix typos * revise the description * capital * capitalize	2019-07-24 17:37:32 -07:00
Magnus Henoch	c87b47e0fa	More documentation formatting fixes (#8149 ) Add empty lines before bulleted lists and code blocks, to ensure that they show up properly on the web site. See also #8079.	2019-07-24 15:26:03 -07:00
Clint Wylie	b8b22b7aaa	fix references to bin/supervise in tutorial docs (#8087 )	2019-07-23 15:05:01 -07:00
Clint Wylie	83514958db	remove unnecessary lock in ForegroundCachePopulator leading to a lot of contention (#8116 ) * remove unecessary lock in ForegroundCachePopulator leading to a lot of contention * mutableboolean, javadocs,document some cache configs that were missing * more doc stuff * adjustments * remove background documentation	2019-07-23 10:57:59 -07:00
Sashidhar Thallam	ea4bad7836	Druid SQL EXTRACT time function - adding support for additional Time Units (#8068 ) * 1. Added TimestampExtractExprMacro.Unit for MILLISECOND 2. expr eval for MILLISECOND 3. Added a test case to test extracting millisecond from expression. #7935 * 1. Adding DATASOURCE4 in tests. 2. Adding test TimeExtractWithMilliseconds * Fixing testInformationSchemaTables test * Fixing failing tests in DruidAvaticaHandlerTest * Adding cannotVectorize() call before the test * Extract time function - Adding support for MICROSECOND, ISODOW, ISOYEAR and CENTURY time units, documentation changes. * Adding MILLISECOND in test case * Adding support DECADE and MILLENNIUM, updating test case and documentation * Fixing expression eval for DECADE and MILLENIUM	2019-07-19 20:38:32 -07:00
Roman Leventov	ceb969903f	Refactor SQLMetadataSegmentManager; Change contract of REST met… (#7653 ) * Refactor SQLMetadataSegmentManager; Change contract of REST methods in DataSourcesResource * Style fixes * Unused imports * Fix tests * Fix style * Comments * Comment fix * Remove unresolvable Javadoc references; address comments * Add comments to ImmutableDruidDataSource * Merge with master * Fix bad web-console merge * Fixes in api-reference.md * Rename in DruidCoordinatorRuntimeParams * Fix compilation * Residual changes	2019-07-17 17:18:48 +03:00
Magnus Henoch	179253a2fc	Fix documentation formatting (#8079 ) The Markdown dialect used when publishing the documentation to the web site is much more sensitive than Github-flavoured Markdown. In particular, it requires an empty line before code blocks (unless the code block starts right after a heading), otherwise the code block gets formatted in-line with the previous paragraph. Likewise for bullet-point lists.	2019-07-15 09:55:18 -07:00
Gian Merlino	ffa25b7832	Query vectorization. (#6794 ) * Benchmarks: New SqlBenchmark, add caching & vectorization to some others. - Introduce a new SqlBenchmark geared towards benchmarking a wide variety of SQL queries. Rename the old SqlBenchmark to SqlVsNativeBenchmark. - Add (optional) caching to SegmentGenerator to enable easier benchmarking of larger segments. - Add vectorization to FilteredAggregatorBenchmark and GroupByBenchmark. * Query vectorization. This patch includes vectorized timeseries and groupBy engines, as well as some analogs of your favorite Druid classes: - VectorCursor is like Cursor. (It comes from StorageAdapter.makeVectorCursor.) - VectorColumnSelectorFactory is like ColumnSelectorFactory, and it has methods to create analogs of the column selectors you know and love. - VectorOffset and ReadableVectorOffset are like Offset and ReadableOffset. - VectorAggregator is like BufferAggregator. - VectorValueMatcher is like ValueMatcher. There are some noticeable differences between vectorized and regular execution: - Unlike regular cursors, vector cursors do not understand time granularity. They expect query engines to handle this on their own, which a new VectorCursorGranularizer class helps with. This is to avoid too much batch-splitting and to respect the fact that vector selectors are somewhat more heavyweight than regular selectors. - Unlike FilteredOffset, FilteredVectorOffset does not leverage indexes for filters that might partially support them (like an OR of one filter that supports indexing and another that doesn't). I'm not sure that this behavior is desirable anyway (it is potentially too eager) but, at any rate, it'd be better to harmonize it between the two classes. Potentially they should both do some different thing that is smarter than what either of them is doing right now. - When vector cursors are created by QueryableIndexCursorSequenceBuilder, they use a morphing binary-then-linear search to find their start and end rows, rather than linear search. Limitations in this patch are: - Only timeseries and groupBy have vectorized engines. - GroupBy doesn't handle multi-value dimensions yet. - Vector cursors cannot handle virtual columns or descending order. - Only some filters have vectorized matchers: "selector", "bound", "in", "like", "regex", "search", "and", "or", and "not". - Only some aggregators have vectorized implementations: "count", "doubleSum", "floatSum", "longSum", "hyperUnique", and "filtered". - Dimension specs other than "default" don't work yet (no extraction functions or filtered dimension specs). Currently, the testing strategy includes adding vectorization-enabled tests to TimeseriesQueryRunnerTest, GroupByQueryRunnerTest, GroupByTimeseriesQueryRunnerTest, CalciteQueryTest, and all of the filtering tests that extend BaseFilterTest. In all of those classes, there are some test cases that don't support vectorization. They are marked by special function calls like "cannotVectorize" or "skipVectorize" that tell the test harness to either expect an exception or to skip the test case. Testing should be expanded in the future -- a project in and of itself. Related to #3011. * WIP * Adjustments for unused things. * Adjust javadocs. * DimensionDictionarySelector adjustments. * Add "clone" to BatchIteratorAdapter. * ValueMatcher javadocs. * Fix benchmark. * Fixups post-merge. * Expect exception on testGroupByWithStringVirtualColumn for IncrementalIndex. * BloomDimFilterSqlTest: Tag two non-vectorizable tests. * Minor adjustments. * Update surefire, bump up Xmx in Travis. * Some more adjustments. * Javadoc adjustments * AggregatorAdapters adjustments. * Additional comments. * Remove switching search. * Only missiles.	2019-07-12 12:54:07 -07:00
Chi Cao Minh	da3d141dd2	Add inline firehose (#8056 ) * Add inline firehose To allow users to quickly parsing and schema, add a firehose that reads data that is inlined in its spec. * Address review comments * Remove suppression of sonar warnings	2019-07-11 21:43:46 -07:00
Atul Mohan	631cda649b	Include replicated segment size property for datasources endpoint (#8039 ) * Add replication size * Summon comma	2019-07-11 01:10:38 -07:00
Himanshu	14aec7fcec	add config to optionally disable all compression in intermediate segment persists while ingestion (#7919 ) * disable all compression in intermediate segment persists while ingestion * more changes and build fix * by default retain existing indexingSpec for intermediate persisted segments * document indexSpecForIntermediatePersists index tuning config * fix build issues * update serde tests	2019-07-10 12:22:24 -07:00
Jihoon Son	0a3538b569	Fix license check in travis and make it optional (#8049 ) * Fix license check in travis and make it optional * debug * fix build * too loud maven * move MAVEN_OPTS to top and add comments * adjust script * remove mvn option from python script	2019-07-09 19:35:29 -07:00
Sashidhar Thallam	3353da2974	Adding missing docs for druid.indexer.logs.disableAcl (#8046 )	2019-07-09 16:11:25 -07:00
Jihoon Son	12f12676e3	Binary license management system (#7998 ) * Binary license management system * add missing file * add comment * Address comments * print missing licenses * print druid module name * Add missing licenses and update versions * fix library versions and add missing ones. also fix pom.xml * testing multi thread * Parallel report generation * fix build error * install pyyaml and use old api * install python3 * fix travis script * python3.6 * pip * setuptools * python3-setuptools * address comment * error on not found reports or registered licenses * removed licenses * debug * travis debug * add missing licenses * travis debug * debug * remove debug code * test build script * travis debug * still debug * add missing python lib * debug * debug * fix travis * fix travis * debug travis * flush print * print something more to keep travis alive * adjust print * single threaded * single threaded * debug * debug * remove debug * remove deprecated-2017Q4 from travis conf * remove comments and duplicate sudo	2019-07-08 12:24:51 -07:00
Eyal Yurman	2eee711653	Add missing reference to Materialized-View extension. (#8003 ) * Reference Materialized View extension from extensions page. * Add comma	2019-07-06 13:50:41 -07:00
Dinesh Sawant	9c7c7c58ae	Fix overlord port in delete data tutorial (#8037 ) In Single-Server Quickstart tutorial the overlord and coordinator is started as one process on port 8081. But in delete data tutorial the kill task is sent to 8090 port, which fails.	2019-07-06 08:50:01 -07:00
Chi Cao Minh	0ded0ce414	Add round support for DS-HLL (#8023 ) * Add round support for DS-HLL Since the Cardinality aggregator has a "round" option to round off estimated values generated from the HyperLogLog algorithm, add the same "round" option to the DataSketches HLL Sketch module aggregators to be consistent. * Fix checkstyle errors * Change HllSketchSqlAggregator to do rounding * Fix test for standard-compliant null handling mode	2019-07-05 15:37:58 -07:00
Clint Wylie	42a7b8849a	remove FirehoseV2 and realtime node extensions (#8020 ) * remove firehosev2 and realtime node extensions * revert intellij stuff * rat exclusion	2019-07-04 15:40:22 -07:00
Gian Merlino	613f09b45a	SQL: Add TIME_CEIL function. (#8027 ) Also simplify conversions for CEIL, FLOOR, and TIME_FLOOR by allowing them to share more code.	2019-07-04 15:40:03 -07:00
Clint Wylie	3b84246cd6	add SQL docs for multi-value string dimensions (#8011 ) * add SQL docs for multi-value string dimensions * formatting consistency * fix typo * adjust	2019-07-03 08:22:33 -07:00
Clint Wylie	c556d44a19	more sql support for expression array functions (#7974 ) * more sql support for expression array functions * prepend/slice * doc fixes * fix imports * fix tests * add null numeric expr for proper conversions between ExprEval and Expr and back to ExprEval * re-arrange * imports :( * add append/prepend test	2019-07-02 21:39:26 -07:00
Clint Wylie	f7283378ac	remove deprecated standalone realtime node (#7915 ) * remove CliRealtime, RealtimeManager, etc * add redirects for deleted page to page that explains the deleted thing * adjust docs	2019-07-02 18:12:17 -07:00
Clint Wylie	93b738bbfa	expression language array constructor and sql multi-value string filtering support (#7973 ) * expr array constructor and sql multi-value string support * doc fix * checkstyle * change from feedback	2019-07-01 15:14:50 -07:00
Eyal Yurman	3650eed1aa	Improve pull-deps reference in extensions page. (#8002 )	2019-07-01 11:18:27 -07:00
Xue Yu	2831944056	support NVL sql function (#7965 ) * sql nvl * add nvl in sql doc	2019-06-30 13:14:30 -07:00
Jihoon Son	f148249f64	Fix wrong redirect for orc extension (#7983 )	2019-06-27 16:27:08 -07:00
Alexander Saydakov	f38a62e949	theta sketch to string post agg (#7937 )	2019-06-27 15:09:57 -07:00
Vadim Ogievetsky	ad45ef12ed	fix SQL doc comment (#7981 )	2019-06-27 15:05:45 -07:00
Jihoon Son	c4aaf26797	Add missing redirect for ORC extension document (#7979 )	2019-06-27 14:23:44 -07:00
Clint Wylie	10d6b0318d	clarify granularity docs (#7977 )	2019-06-27 08:51:22 -07:00
Xue Yu	5464c8938f	Add array_slice and array_unshift function expr (#7950 ) * add array_slice and array_unshift function expr * feedback address	2019-06-26 16:56:09 -07:00
Benedict Jin	16aafd5788	[ImgBot] Optimize images (#7873 ) *Total -- 10,997.25kb -> 7,160.16kb (34.89%) /publications/radstack/figures/precompute.png -- 54.20kb -> 16.97kb (68.69%) /web-console/favicon.png -- 4.41kb -> 1.61kb (63.58%) /docs/img/indexing_service.png -- 47.37kb -> 21.96kb (53.64%) /docs/img/segmentPropagation.png -- 62.94kb -> 29.85kb (52.57%) /docs/content/tutorials/img/tutorial-quickstart-01.png -- 55.62kb -> 29.13kb (47.62%) /docs/content/tutorials/img/tutorial-deletion-02.png -- 791.43kb -> 429.30kb (45.76%) /docs/content/tutorials/img/tutorial-deletion-03.png -- 786.79kb -> 427.05kb (45.72%) /docs/content/tutorials/img/tutorial-retention-00.png -- 135.06kb -> 75.88kb (43.82%) /docs/content/tutorials/img/tutorial-batch-data-loader-10.png -- 77.23kb -> 43.47kb (43.71%) /docs/content/tutorials/img/tutorial-batch-data-loader-01.png -- 97.03kb -> 55.16kb (43.15%) /docs/content/tutorials/img/tutorial-batch-data-loader-07.png -- 79.49kb -> 45.44kb (42.84%) /docs/content/tutorials/img/tutorial-retention-02.png -- 401.30kb -> 234.68kb (41.52%) /docs/content/tutorials/img/tutorial-compaction-06.png -- 343.27kb -> 201.87kb (41.19%) /docs/content/tutorials/img/tutorial-batch-data-loader-09.png -- 105.14kb -> 61.86kb (41.16%) /docs/content/tutorials/img/tutorial-retention-06.png -- 227.57kb -> 134.35kb (40.97%) /docs/content/tutorials/img/tutorial-compaction-04.png -- 304.83kb -> 180.04kb (40.94%) /docs/content/tutorials/img/tutorial-compaction-02.png -- 273.18kb -> 162.67kb (40.45%) /docs/content/tutorials/img/tutorial-query-05.png -- 85.03kb -> 50.64kb (40.44%) /publications/radstack/figures/druid_vs_bigquery.png -- 155.44kb -> 92.85kb (40.27%) /docs/content/tutorials/img/tutorial-kafka-02.png -- 122.51kb -> 73.93kb (39.65%) /docs/content/tutorials/img/tutorial-deletion-01.png -- 70.37kb -> 42.56kb (39.52%) /docs/content/tutorials/img/tutorial-batch-data-loader-06.png -- 103.50kb -> 62.79kb (39.33%) /docs/content/tutorials/img/tutorial-batch-submit-task-01.png -- 111.25kb -> 67.73kb (39.12%) /docs/content/tutorials/img/tutorial-query-03.png -- 103.60kb -> 63.51kb (38.69%) /docs/content/tutorials/img/tutorial-query-04.png -- 105.79kb -> 64.87kb (38.69%) /docs/content/tutorials/img/tutorial-batch-data-loader-11.png -- 130.20kb -> 81.34kb (37.53%) /docs/content/tutorials/img/tutorial-query-07.png -- 122.52kb -> 76.79kb (37.32%) /docs/content/tutorials/img/tutorial-kafka-01.png -- 133.12kb -> 83.47kb (37.3%) /docs/content/tutorials/img/tutorial-query-06.png -- 127.55kb -> 80.28kb (37.06%) /docs/content/tutorials/img/tutorial-batch-submit-task-02.png -- 133.07kb -> 84.06kb (36.83%) /docs/content/tutorials/img/tutorial-retention-05.png -- 60.19kb -> 38.08kb (36.74%) /docs/content/tutorials/img/tutorial-batch-data-loader-03.png -- 211.92kb -> 134.22kb (36.66%) /docs/content/tutorials/img/tutorial-batch-data-loader-05.png -- 250.36kb -> 158.68kb (36.62%) /publications/radstack/figures/radstack.png -- 16.80kb -> 10.67kb (36.48%) /docs/content/tutorials/img/tutorial-batch-data-loader-08.png -- 158.59kb -> 101.49kb (36%) /docs/content/tutorials/img/tutorial-batch-data-loader-04.png -- 255.10kb -> 163.33kb (35.97%) /docs/content/tutorials/img/tutorial-query-02.png -- 126.92kb -> 81.42kb (35.85%) /docs/content/tutorials/img/tutorial-compaction-01.png -- 53.86kb -> 34.87kb (35.25%) /docs/img/druid-architecture.png -- 202.23kb -> 130.97kb (35.24%) /docs/content/tutorials/img/tutorial-retention-01.png -- 52.69kb -> 34.35kb (34.81%) /docs/img/druid-timeline.png -- 35.87kb -> 23.59kb (34.22%) /docs/content/tutorials/img/tutorial-query-01.png -- 149.53kb -> 98.56kb (34.08%) /docs/content/tutorials/img/tutorial-retention-04.png -- 65.91kb -> 43.57kb (33.89%) /docs/content/tutorials/img/tutorial-compaction-08.png -- 42.24kb -> 28.08kb (33.53%) /docs/content/tutorials/img/tutorial-compaction-07.png -- 39.17kb -> 26.06kb (33.47%) /docs/content/tutorials/img/tutorial-compaction-03.png -- 39.17kb -> 26.13kb (33.3%) /docs/content/tutorials/img/tutorial-compaction-05.png -- 38.85kb -> 25.96kb (33.17%) /publications/demo/figures/throughput_vs_cardinality.png -- 73.49kb -> 49.31kb (32.9%) /publications/radstack/figures/throughput_vs_cardinality.png -- 73.49kb -> 49.31kb (32.9%) /publications/whitepaper/figures/throughput_vs_cardinality.png -- 73.49kb -> 49.31kb (32.9%) /docs/content/tutorials/img/tutorial-retention-03.png -- 43.11kb -> 29.33kb (31.97%) /publications/radstack/figures/throughput_vs_num_dims.png -- 72.86kb -> 49.72kb (31.76%) /publications/whitepaper/figures/throughput_vs_num_dims.png -- 72.86kb -> 49.72kb (31.76%) /publications/demo/figures/throughput_vs_num_dims.png -- 72.86kb -> 49.72kb (31.76%) /publications/radstack/figures/joined.png -- 164.14kb -> 113.47kb (30.87%) /docs/content/tutorials/img/tutorial-batch-data-loader-02.png -- 508.93kb -> 351.85kb (30.87%) /publications/radstack/figures/imps_clicks.png -- 190.95kb -> 132.70kb (30.51%) /publications/radstack/figures/shuffled.png -- 180.46kb -> 128.21kb (28.95%) /publications/radstack/figures/pipeline.png -- 392.54kb -> 281.93kb (28.18%) /docs/img/druid-manage-1.png -- 108.94kb -> 78.53kb (27.92%) /publications/radstack/figures/throughput_vs_num_metrics.png -- 85.25kb -> 61.80kb (27.51%) /publications/demo/figures/throughput_vs_num_metrics.png -- 85.25kb -> 61.80kb (27.51%) /publications/whitepaper/figures/throughput_vs_num_metrics.png -- 85.25kb -> 61.80kb (27.51%) /docs/img/druid-production.png -- 50.00kb -> 39.18kb (21.63%) /docs/img/druid-dataflow-3.png -- 88.25kb -> 69.75kb (20.96%) /publications/demo/figures/realtime_flow.png -- 51.12kb -> 40.61kb (20.56%) /publications/demo/figures/realtime_timeline.png -- 36.15kb -> 29.24kb (19.12%) /publications/demo/figures/tpch_scaling.png -- 43.21kb -> 34.97kb (19.08%) /publications/demo/figures/caching.png -- 35.26kb -> 29.09kb (17.49%) /dev/intellij-sdk-config.jpg -- 1,019.35kb -> 864.37kb (15.2%) /docs/img/druid-column-types.png -- 101.53kb -> 91.17kb (10.2%) /docs/img/druid-dataflow-2x.png -- 138.30kb -> 127.11kb (8.09%)	2019-06-24 21:27:48 -07:00
Jonathan Wei	35601bb7a0	Add finalizeAsBase64Binary option to FixedBucketsHistogramAggregatorFactory (#7784 ) * Add finalizeAsBase64Binary option to FixedBucketsHistogramAggregatorFactory * Add finalizeAsBase64Binary option to ApproximateHistogramFactory * Update approx histogram doc	2019-06-21 18:00:19 -07:00
Clint Wylie	494b8ebe56	multi-value string column support for expressions (#7588 ) * array support for expression language for multi-value string columns * fix tests? * fixes * more tests * fixes * cleanup * more better, more test * ignore inspection * license * license fix * inspection * remove dumb import * more better * some comments * add expr rewrite for arrayfn args for more magic, tests * test stuff * more tests * fix test * fix test * castfunc can deal with arrays * needs more empty array * more tests, make cast to long array more forgiving * refactor * simplify ExprMacro Expr implementations with base classes in core * oops * more test * use Shuttle for Parser.flatten, javadoc, cleanup * fixes and more tests * unused import * fixes * javadocs, cleanup, refactors * fix imports * more javadoc * more javadoc * more * more javadocs, nonnullbydefault, minor refactor * markdown fix * adjustments * more doc * move initial filter out * docs * map empty arg lambda, apply function argument validation * check function args at parse time instead of eval time * more immutable * more more immutable * clarify grammar * fix docs * empty array is string test, we need a way to make arrays better maybe in the future, or define empty arrays as other types..	2019-06-19 13:57:37 -07:00
Clint Wylie	71997c16a2	switch links from druid.io to druid.apache.org (#7914 ) * switch links from druid.io to druid.apache.org * fix it	2019-06-18 09:06:27 -07:00
Vadim Ogievetsky	24dd4573da	Added the web console to the quickstart tutorials and docs (#7863 ) * added console to the quickstart tutorials * feedback fixes * feedback fixes * more typo fixes * moved reseting cluster section after load data * update images * stage -> step * feedback fixes * more feedback fixes	2019-06-17 18:00:54 -07:00
Himanshu	b3328b2785	endpoint to delete lookup tier and remove tier on last lookup deletion (#7852 )	2019-06-15 17:55:50 -07:00
Justin Borromeo	8e5003b01c	Scan Doc Change (#7903 )	2019-06-15 01:21:34 -07:00
Jihoon Son	3cd9a7507d	Fix script for dependencies report for extensions (#7899 )	2019-06-14 18:53:50 -07:00
Jihoon Son	a648e1548d	Add support of --exclude-extension argument for dependency report script (#7786 )	2019-06-14 15:18:59 -07:00
Xue Yu	456a3654ce	add PolygonBound and missing extentions list doc (#7885 )	2019-06-13 12:03:58 -07:00
Clint Wylie	8117222da3	use right port for kafka tutorial, reinfoce that tutorials assume you are using micro-quickstart single-server configuration (#7862 )	2019-06-11 08:50:52 -07:00
Xue Yu	ce591d1457	Support var_pop, var_samp, stddev_pop and stddev_samp etc in sql (#7801 ) * support var_pop, stddev_pop etc in sql * fix sql compatible * rebase on master * update doc	2019-06-10 09:40:09 -07:00
Clint Wylie	3fbb0a5e00	Supervisor list api with states and health (#7839 ) * allow optionally listing all supervisors with their state and health * docs * add state to full * clean * casing * format * spelling	2019-06-07 16:26:33 -07:00
Jihoon Son	61ec521135	Remove keepSegmentGranularity option for compaction (#7747 ) * Remove keepSegmentGranularity option from compaction * fix it test * clean up * remove from web console * fix test	2019-06-03 12:59:15 -07:00
Jihoon Son	e289820bbd	Add a script to find missing backports (#7817 )	2019-06-03 07:56:52 -07:00
Eyal Yurman	69e9b8a464	Enables SQL by default. (#7808 )	2019-05-31 20:53:42 -07:00
Justin Borromeo	8032c4add8	Add errors and state to stream supervisor status API endpoint (#7428 ) * Add state and error tracking for seekable stream supervisors * Fixed nits in docs * Made inner class static and updated spec test with jackson inject * Review changes * Remove redundant config param in supervisor * Style * Applied some of Jon's recommendations * Add transience field * write test * implement code review changes except for reconsidering logic of markRunFinishedAndEvaluateHealth() * remove transience reporting and fix SeekableStreamSupervisorStateManager impl * move call to stateManager.markRunFinished() from RunNotice to runInternal() for tests * remove stateHistory because it wasn't adding much value, some fixes, and add more tests * fix tests * code review changes and add HTTP health check status * fix test failure * refactor to split into a generic SupervisorStateManager and a specific SeekableStreamSupervisorStateManager * fixup after merge * code review changes - add additional docs * cleanup KafkaIndexTaskTest * add additional documentation for Kinesis indexing * remove unused throws class	2019-05-31 17:16:01 -07:00
Jonathan Wei	83152a7a00	Fix performance-faq and remove insert-segment-to-db redirects (#7759 )	2019-05-24 13:20:02 -07:00
Jonathan Wei	cfb7756c9b	Fix references to removed performance FAQ page (#7755 )	2019-05-24 11:52:40 -07:00
Jonathan Wei	eb0e1a056c	Add limit to timeseries docs (#7750 )	2019-05-23 19:41:52 -07:00
Jonathan Wei	f2e34a76bd	Fix TOC clustering example link (#7749 )	2019-05-23 19:41:27 -07:00
Jonathan Wei	ec4d09a02f	Remove obsolete isExcluded config from Kerberos authenticator (#7745 )	2019-05-23 16:00:05 -07:00
awelsh93	6964ac23a2	Adding influxdb emitter as a contrib extension (#7717 ) * Adding influxdb emitter as a contrib extension * addressing code review comments	2019-05-23 11:11:48 -07:00
Fangjin Yang	3dec5cd1e4	reorganizing the ToC (#7734 )	2019-05-23 09:24:38 -07:00
gocho1	bd899b9224	add s3 authentication method informations (#7674 ) * add s3 authentication method informations * add druid.s3.fileSessionCredentials related content * remove authentication parameters to avoid confusion as it is more detailed in S3 Deep Storage page * streamline s3 docs	2019-05-22 11:46:02 -07:00
Gian Merlino	cbbce955de	SQL: Allow NULLs in place of optional arguments in many functions. (#7709 ) * SQL: Allow NULLs in place of optional arguments in many functions. Also adjust SQL docs to describe how to make time literals using TIME_PARSE (which is now possible in a nicer way). * Be less forbidden.	2019-05-21 11:54:34 -07:00
Gian Merlino	b6941551ae	Upgrade various build and doc links to https. (#7722 ) * Upgrade various build and doc links to https. Where it wasn't possible to upgrade build-time dependencies to https, I kept http in place but used hardcoded checksums or GPG keys to ensure that artifacts fetched over http are verified properly. * Switch to https://apache.org.	2019-05-21 11:30:14 -07:00
Xue Yu	dd7dace70a	Add TIMESTAMPDIFF sql support (#7695 ) * add timestampdiff sql support * feedback address	2019-05-21 08:05:38 -07:00
Vadim Ogievetsky	156322932f	Update Druid Console docs for 0.15.0 (#7697 ) * Update Druid Console docs for 0.15.0 * SQL -> query * added links and fix typos	2019-05-21 04:00:42 -07:00
andrewluotechnologies	1add566411	Fix typo (ComplexMetricSerde class name was spelled incorrectly) (#7694 )	2019-05-19 09:49:54 -07:00
Jihoon Son	94721de141	Add auto tagging milestone script (#7677 ) * Add auto tagging milestone script * fix usage * missing newline * missing newline	2019-05-16 23:11:16 -07:00
Clint Wylie	939b417379	Update tutorial-kafka.md (#7678 )	2019-05-16 23:10:45 -07:00
Jonathan Wei	d99f77a01b	Add option to use YARN RM as fallback for JobHistory failure (#7673 ) * Add option to use YARN RM as fallback for job status * PR comments	2019-05-16 13:59:10 -07:00
Fangjin Yang	dc85a5309e	some more doc improvements (#7675 )	2019-05-16 13:17:21 -07:00
Jonathan Wei	d667655871	Add basic tuning guide, getting started page, updated clustering docs (#7629 ) * Add basic tuning guide, getting started page, updated clustering docs * Add note about caching, fix tutorial paths * Adjust hadoop wording * Add license * Tweak * Shrink overlord heaps, fix tutorial urls * Tweak xlarge peon, update peon sizing * Update Data peon buffer size * Fix cluster start scripts * Add upper level _common to classpath * Fix cluster data/query confs * Address PR comments * Elaborate on connection pools * PR comments * Increase druid.broker.http.maxQueuedBytes * Add guidelines for broker backpressure * PR comments	2019-05-16 11:13:48 -07:00
Benedict Jin	3df364c472	Fix broken links in api-reference.md (#7670 )	2019-05-15 18:53:34 -07:00
Clint Wylie	c2abbc24a7	minor web console doc fixes (#7668 )	2019-05-15 18:52:51 -07:00
Surekha	d3545f5086	Show all server types in sys.servers table (#7654 ) * update sys.servers table to show all servers * update docs * Fix integration test * modify test query for batch integration test * fix case in test queries * make the server_type lowercase * Apply suggestions from code review Co-Authored-By: Himanshu <g.himanshu@gmail.com> * Fix compilation from git suggestion * fix unit test	2019-05-15 16:54:02 -07:00
Gian Merlino	0352f450d7	Fix broken links in docs, add broken link checker. (#7658 ) Also adds back insert-segment-to-db.md with some docs about why and when it was removed (in #6911).	2019-05-15 14:49:50 -07:00
Surekha	917106985f	Update tutorial to delete data (#7577 ) * Update tutorial to delete data * update tutorial, remove old ways to drop data * PR comments	2019-05-15 14:40:06 -07:00
Jonathan Wei	e874da7cea	Add simpler permissions option to BasicAuthorizer GET APIs (#7635 ) * Add simpler permissions option to BasicAuthorizer GET APIs * Adjust log message Co-Authored-By: Himanshu <g.himanshu@gmail.com> * Adjust log message Co-Authored-By: Himanshu <g.himanshu@gmail.com>	2019-05-15 12:59:32 -07:00
Clint Wylie	b87c8f0314	fix lookup editor to use lookup tiers instead of historical tiers (#7647 ) * fix lookup editor to use lookup tiers instead of historical tiers * use default tier if empty response, fix if configured lookups is null * fixes * fix typo	2019-05-14 13:30:51 -07:00
Alexander Saydakov	ca1a6649f6	Datasketches quantiles more post-aggs (#7550 ) * rank and CDF post-aggs * added post-aggs to the module * added new post-aggs * moved post-agg IDs * moved post-agg IDs	2019-05-10 11:46:54 -07:00
Clint Wylie	402d76a10f	make-redirects.py requires python3, explicitly specify it (#7625 )	2019-05-09 21:32:58 -07:00
Clint Wylie	6a6c6d573d	Add plain text README.txt, use relative link from README.md to build.md (#7611 ) * use relative link to build instructions from top level readme * add textfile to readme * formatting * make README.BINARY plaintext, move LABELS.md to LABELS, README.txt to README * exclude README.BINARY still * remove jdk links/recommmendations * add script to use DRUIDVERSION in textfile README instead of latest, add links to recommended jdk to build.md * license * better readme template, links to latest if does not detect an apache release version * fix	2019-05-09 21:29:26 -07:00
Samarth Jain	b542bb9f34	TDigest backed sketch aggregators (#7331 ) * First set of changes for tDigest histogram * Add license * Address code review comments * Add a doc page for new T-Digest sketch aggregators. Minor code cleanup and comments. * Remove synchronization from BufferAggregators. Address code review comments * Fix typo	2019-05-09 17:22:55 -07:00
Magnus Henoch	2ac112151f	Fix formatting in scan query documentation (#7622 ) Escape underscores in `__time`, so they're not interpreted as bold formatting.	2019-05-09 11:32:37 -07:00
Jinseon Lee	0ef435a16c	add postgresql meta db table schema configuration property (#7137 ) (#7183 ) * add postgresql meta db table schema configuration property (#7137) If the postgresql db schema changes, you must set the configuration values. You do not need to set it if there is no change from the default schema 'public'. druid.metadata.postgres.dbTableSchema=public * create postgresql metadb table schema configuration property (#7137) If the postgresql db schema changes, you must set the configuration values. You do not need to set it if there is no change from the default schema 'public'. druid.metadata.postgres.dbTableSchema=public check PostgreSQLTablesConfig.java * modify postgresql readme file. - metadb table schema (#7137) If the postgresql db schema changes, you must set the configuration values. You do not need to set it if there is no change from the default schema 'public'. druid.metadata.postgres.dbTableSchema=public check PostgreSQLTablesConfig.java	2019-05-08 12:56:30 -07:00
Jonathan Wei	dadf6a2f11	Add tool for migrating from local deep storage/Derby metadata (#7598 ) * Add tool for migrating from local deep storage/Derby metadata * Split deep storage and metadata migration docs * Support import into Derby * Fix create tables cmd * Fix create tables cmd * Fix commands * PR comment * Add -p	2019-05-06 23:39:40 -07:00
Jonathan Wei	7c2ca474da	Add single-machine deployment example cfgs and scripts (#7590 ) * Add single-machine deployment example cfgs and scripts * Add (8u92+) * Use combined coordinator-overlord for single machine confs * RAT fix	2019-05-06 19:11:13 -07:00
Gian Merlino	727b65c7e5	Remove SQL experimental banner and other doc adjustments. (#7591 ) * Remove SQL experimental banner and other doc adjustments. Also, - Adjust the ToC and other docs a bit so SQL and native queries are presented on more equal footing. - De-emphasize querying historicals and peons directly in the native query docs. This is a really niche thing and may have been confusing to include prominently in the very first paragraph. - Remove DataSketches and Kafka indexing service from the experimental features ToC. They are not experimental any longer and were there in error. * More notes. * Slight tweak. * Remove extra extra word. * Remove RT node from ToC.	2019-05-06 12:31:51 -07:00
Samarth Jain	afbcb9c07f	Improve parallelism of zookeeper based segment change processing (#7088 ) * V1 - improve parallelism of zookeeper based segment change processing * Create zk nodes in batches. Address code review comments. Introduce various configs. * Add documentation for the newly added configs * Fix test failures * Fix more test failures * Remove prinstacktrace statements * Address code review comments * Use a single queue * Address code review comments Since we have a separate load peon for every historical, just having a single SegmentChangeProcessor task per historical is enough. This commit also gets rid of the associated config druid.coordinator.loadqueuepeon.curator.numCreateThreads * Resolve merge conflict * Fix compilation failure * Remove batching since we already have a dynamic config maxSegmentsInNodeLoadingQueue that provides that control * Fix NPE in test * Remove documentation for configs that are no longer needed * Address code review comments * Address more code review comments * Fix checkstyle issue * Address code review comments * Code review comments * Add back monitor node remove executor * Cleanup code to isolate null checks and minor refactoring * Change param name since it conflicts with member variable name	2019-05-03 15:58:42 +02:00
Jonathan Wei	a013350018	Adjust required permissions for system schema (#7579 ) * Adjust required permissions for system schema * PR comments, fix current_size handling * Checkstyle * Set curr_size instead of current_size * Adjust information schema docs * Fix merge conflict * Update tests	2019-05-02 07:18:02 -07:00
Surekha	15d19f3059	Add is_overshadowed column to sys.segments table (#7425 ) * Add is_overshadowed column to sys.segments table * update docs * Rename class and variables * PR comments * PR comments * remove unused variables in MetadataResource * move constants together * add getFullyOvershadowedSegments method to ImmutableDruidDataSource * Fix compareTo of SegmentWithOvershadowedStatus * PR comment * PR comments * PR comments * PR comments * PR comments * fix issue with already consumed stream * minor refactoring * PR comments	2019-05-01 18:00:57 +02:00
Gian Merlino	c648775b5b	SQL: Remove "useFallback" feature. (#7567 ) This feature allows Calcite's Bindable interpreter to be bolted on top of Druid queries and table scans. I think it should be removed for a few reasons: 1. It is not recommended for production anyway, because it generates unscalable query plans (e.g. it will plan a join into two table scans and then try to do the entire join in memory on the broker). 2. It doesn't work with Druid-specific SQL functions, like TIME_FLOOR, REGEXP_EXTRACT, APPROX_COUNT_DISTINCT, etc. 3. It makes the SQL planning code needlessly complicated. With SQL coming out of experimental status soon, it's a good opportunity to remove this feature.	2019-04-28 18:26:44 -07:00
Eyal Yurman	f02251ab2d	Contributing Moving-Average Query to open source. (#6430 ) * Contributing Moving-Average Query to open source. * Fix failing code inspections. * See if explicit types will invoke the correct comparison function. * Explicitly remove support for druid.generic.useDefaultValueForNull configuration parameter. * Update styling and headers for complience. * Refresh code with latest master changes: * Remove NullDimensionSelector. * Apply changes of RequestLogger. * Apply changes of TimelineServerView. * Small checkstyle fix. * Checkstyle fixes. * Fixing rat errors; Teamcity errors. * Removing support theta sketches. Will be added back in this pr or a following once DI conflicts with datasketches are resolved. * Implements some of the review fixes. * Contributing Moving-Average Query to open source. * Fix failing code inspections. * See if explicit types will invoke the correct comparison function. * Explicitly remove support for druid.generic.useDefaultValueForNull configuration parameter. * Update styling and headers for complience. * Refresh code with latest master changes: * Remove NullDimensionSelector. * Apply changes of RequestLogger. * Apply changes of TimelineServerView. * Small checkstyle fix. * Checkstyle fixes. * Fixing rat errors; Teamcity errors. * Removing support theta sketches. Will be added back in this pr or a following once DI conflicts with datasketches are resolved. * Implements some of the review fixes. * More fixes for review. * More fixes from review. * MapBasedRow is Unmodifiable. Create new rows instead of modifying existing ones. * Remove more changes related to datasketches support. * Refactor BaseAverager startFrom field and add a comment. * fakeEvents field: Refactor initialization and add comment. * Rename parameters (tiny change). * Fix variable name typo in test (JAN_4). * Fix styling of non camelCase fields. * Fix Preconditions.checkArgument for cycleSize. * Add more documentation to RowBucketIterable and other classes. * key/value comment on in MovingAverageIterable. * Fix anonymous makeColumnValueSelector returning null. * Replace IdentityYieldingAccumolator with Yielders.each(). * * internalNext() should return null instead of throwing exception. * Remove unused variables/prarameters. * Harden MovingAverageIterableTest (Switch anyOf to exact match). * Change internalNext() from recursion to iteration; Simplify next() and hasNext(). * Remove unused imports. * Address review comments. * Rename fakeEvents to emptyEvents. * Remove redundant parameter key from computeMovingAverage. * Check yielder as well in RowBucketIterable#hasNext() * Fix javadoc.	2019-04-26 17:07:48 -07:00
Adam Peck	ebdf07b69f	Add reload by interval API (#7490 ) * Add reload by interval API Implements the reload proposal of #7439 Added tests and updated docs * PR updates * Only build timeline with required segments Use 404 with message when a segmentId is not found Fix typo in doc Return number of segments modified. * Fix checkstyle errors * Replace String.format with StringUtils.format * Remove return value * Expand timeline to segments that overlap for intervals Restrict update call to only segments that need updating. * Only add overlapping enabled segments to the timeline * Some renames for clarity Added comments * Don't rely on cached poll data Only fetch required information from DB * Match error style * Merge and cleanup doc * Fix String.format call * Add unit tests * Fix unit tests that check for overshadowing	2019-04-26 16:01:50 -07:00
Clint Wylie	09b7700d13	fix docs (#7556 )	2019-04-25 22:00:37 -07:00
Justin Borromeo	012ab02bf4	Update select doc disclaimer (#7554 )	2019-04-25 19:23:39 -07:00
Surekha	8308ffef1f	API to drop data by interval (#7494 ) * Add api to drop data by interval * update to address comments * unused imports * PR comments + add tests in SQLMetadataSegmentManagerTest * update tests and docs	2019-04-25 14:24:40 -07:00
Jonathan Wei	658fb2b062	Fix bugs in milestone contributor script (#7545 ) * Only check PRs in milestone contributor script * Fix no-pagination bug	2019-04-24 22:11:57 -07:00
Jonathan Wei	8b1a4e18dd	Additional Apache branding doc updates (#7524 )	2019-04-23 14:39:16 -07:00
Xue Yu	2c8a71f883	Support LPAD and RPAD sql function (#7388 ) * lpad and rpad sql function * feedback address * feedback address * add doc and format * update docs	2019-04-22 14:51:32 -07:00
Jonathan Wei	3487663de9	Adjust approx agg deprecation wording (#7518 )	2019-04-19 19:31:50 -07:00
Jonathan Wei	74960e82bf	Add more Apache branding to docs (#7515 )	2019-04-19 15:52:26 -07:00
Slim Bouguerra	5463ecb979	Fix broken link due to Typo. (#7513 ) Change-Id: I5792f89ed6afe945f386058edd44f0400998460a	2019-04-19 09:58:54 -07:00
Jonathan Wei	8078f567aa	Update kafka version in tutorials (#7500 )	2019-04-17 14:56:29 -07:00
Kazuhito Takeuchi	7c19c92a81	Add ROUND function in druid-sql. (#7224 ) * Implement round function in druid-sql * Return value according to the type of argument * Fix codes for abnoraml inputs, updated math-expr.md * Fix assert text * Fix error messages and refactor codes * Fix compile error, update sql.md, refactor codes and format tests	2019-04-16 11:15:39 -07:00
Lucas Capistrant	8acad27d99	Enhance the Http Firehose to work with URIs requiring basic authentication (#7145 ) * Enhnace the HttpFirehose to work with both insecure URIs and URIs requiring basic authentication * Improve security of enhanced HttpFirehoseFactory by not logging auth credentials * Fix checkstyle failure in HttpFirehoseFactory.java * Update docs and fix TeamCity build with required noinspection * Indentation cleanup and logic modification for HttpFirehose object stream * Remove default Empty string password provider in http firehose * Add JavaDoc for MixIn describing its intended use * Reverting documentation notation for json code to be inline with rest of doc * Improve instantiation of ObjectMappers that require MixIn for redacting password from task logs * Add comment to clarify fully qualified references of Objects in SQLMetadataStorageActionHandler	2019-04-15 14:29:01 -07:00
Justin Borromeo	85f10ed0d0	Support querying realtime segments using time-ordered scan queries and fix broken scan queries without time column (#7454 ) * Update scan query runner factory to accept SpecificSegmentSpec * nit * Sorry travis * Improve logging and fix doc * Bug fix * Friendlier error msgs and tests to cover bug * Address Gian's comments * Fix doc * Added tests for empty and null column list * Style * Fix checking wrong order (looking at query param when it should be looking at the null-handled order) * Add test case for null order * Fix ScanQueryRunnerTest * Forbidden APIs fixed	2019-04-12 19:08:34 -07:00
zhaojiandong	1d9450da81	Some docs optimization (#6890 ) * some markdown docs optimization * markdown escape	2019-04-12 17:30:57 -07:00
Gian Merlino	2470b3279f	SQL: Fix docs for STRING_FORMAT. (#7455 )	2019-04-11 21:57:28 -07:00
Gian Merlino	a517f8ce49	Coordinator: Allow dropping all segments. (#7447 ) Removes the coordinator sanity check that prevents it from dropping all segments. It's useful to get rid of this, since the behavior is unintuitive for dev/testing clusters where users might regularly want to drop all their data to get back to a clean slate. But the sanity check was there for a reason: to prevent a race condition where the coordinator might drop all segments if it ran before the first metadata store poll finished. This patch addresses that concern differently, by allowing methods in MetadataSegmentManager to return null if a poll has not happened yet, and canceling coordinator runs in that case. This patch also makes the "dataSources" reference in SQLMetadataSegmentManager volatile. I'm not sure why it wasn't volatile before, but it seems necessary to me: it's not final, and it's dereferenced from multiple threads without synchronization.	2019-04-11 08:45:38 -07:00
Justin Borromeo	408e3e1b2a	Remove select execution code from SQL planner (#7416 ) * Removed select execution code from SQL planner * Update doc	2019-04-10 22:32:57 -07:00
Benjamin Hopp	78e6f6fb38	Updated Javascript Affinity config docs (#7441 ) Updated with hostname:port rather than IP Address.	2019-04-10 21:44:50 -07:00
Benedict Jin	2f64414ade	Add "REVERSE" / "REPEAT" / "RIGHT" / "LEFT" functions (#7334 ) * Add "REVERSE" / "REPEAT" / "RIGHT" / "LEFT" functions * Fix ImportOrder * Use RuntimeException instead of OutOfMemoryError according to "Effective Java" * Simplify * Patch suggestions	2019-04-10 11:46:29 +08:00
Clint Wylie	89bb43f382	'core' ORC extension (#7138 ) * orc extension reworked to use apache orc map-reduce lib, moved to core extensions, support for flattenSpec, tests, docs * change binary handling to be compatible with avro and parquet, Rows.objectToStrings now converts byte[] to base64, change date handling * better docs and tests * fix it * formatting * doc fix * fix it * exclude redundant dependencies * use latest orc-mapreduce, add hadoop jobProperties recommendations to docs * doc fix * review stuff and fix binaryAsString * cache for root level fields * more better	2019-04-09 09:03:26 -07:00
Justin Borromeo	799c66d9ac	Allow max rows and max segments for time-ordered scans to be overridden using the scan query JSON spec (#7413 ) * Initial changes * Fixed NPEs * Fixed failing spec test * Fixed failing Calcite test * Move configs to context * Validated and added docs * fixed weird indentation * Update default context vals in doc * Fixed allowable values	2019-04-07 20:12:52 -07:00
Clint Wylie	e28a15f9f5	fix expressions docs operator table (#7420 ) * fix expressions docs operator table * Update math-expr.md	2019-04-07 20:12:00 -07:00
Justin Borromeo	e23fd41fa7	Update SQL doc for planning change (#7415 )	2019-04-05 15:14:07 -07:00
Jonathan Wei	0f6cb1e7e0	Update theta/hll sketch doc comparison (#7407 )	2019-04-03 15:21:33 -07:00
Gian Merlino	8c104a115c	SQL: Add STRING_FORMAT function. (#7327 )	2019-04-03 17:09:54 -04:00
David Glasser	4e23c11345	Make IngestSegmentFirehoseFactory splittable for parallel ingestion (#7048 ) * Make IngestSegmentFirehoseFactory splittable for parallel ingestion * Code review feedback - Get rid of WindowedSegment - Don't document 'segments' parameter or support splitting firehoses that use it - Require 'intervals' in WindowedSegmentId (since it won't be written by hand) * Add missing @JsonProperty * Integration test passes * Add unit test * Remove two FIXME comments from CompactionTask I'd like to leave this PR in a potentially mergeable state, but I still would appreciate reviewer eyes on the questions I'm removing here. * Updates from code review	2019-04-02 14:59:17 -07:00
Xue Yu	78fd5aff21	support radians and degrees in sql (#7336 ) * support radians and degrees in sql * update test case	2019-04-02 12:47:49 -07:00
Qi Shu	134f71d1b4	Add documentation for Druid native query in SQL view of web console (#7381 ) * Add docmentation for Druid native query in SQL view of web console * Edit sentence	2019-04-02 12:20:51 -07:00
Michael Trelinski	347779b17a	Zookeeper loss (#6740 ) * Update init Fix bin/init to source from proper directory. * Fix for Proposal #6518: Shutdown druid processes upon complete loss of ZK connectivity * Zookeeper Loss: - Add feature documentation - Cosmetic refactors - Variable extractions - Remove getter * - Change config key name and reword documentation - Switch from Function<Void,Void> to Runnable/Lambda - try { … } finally { … } * Fix line length too long * - change to formatted string for logging - use System.err.println after lifecycle stops * commenting on makeEnsembleProvider()-created Zookeeper termination * Add javadoc * added java doc reference back to apache discussion thread. * move comment to other class * favor two-slash comments instead of multiline comments	2019-03-29 15:10:42 -07:00
Justin Borromeo	ad7862c58a	Time Ordering On Scans (#7133 ) * Moved Scan Builder to Druids class and started on Scan Benchmark setup * Need to form queries * It runs. * Stuff for time-ordered scan query * Move ScanResultValue timestamp comparator to a separate class for testing * Licensing stuff * Change benchmark * Remove todos * Added TimestampComparator tests * Change number of benchmark iterations * Added time ordering to the scan benchmark * Changed benchmark params * More param changes * Benchmark param change * Made Jon's changes and removed TODOs * Broke some long lines into two lines * nit * Decrease segment size for less memory usage * Wrote tests for heapsort scan result values and fixed bug where iterator wasn't returning elements in correct order * Wrote more tests for scan result value sort * Committing a param change to kick teamcity * Fixed codestyle and forbidden API errors * . * Improved conciseness * nit * Created an error message for when someone tries to time order a result set > threshold limit * Set to spaces over tabs * Fixing tests WIP * Fixed failing calcite tests * Kicking travis with change to benchmark param * added all query types to scan benchmark * Fixed benchmark queries * Renamed sort function * Added javadoc on ScanResultValueTimestampComparator * Unused import * Added more javadoc * improved doc * Removed unused import to satisfy PMD check * Small changes * Changes based on Gian's comments * Fixed failing test due to null resultFormat * Added config and get # of segments * Set up time ordering strategy decision tree * Refactor and pQueue works * Cleanup * Ordering is correct on n-way merge -> still need to batch events into ScanResultValues * WIP * Sequence stuff is so dirty :( * Fixed bug introduced by replacing deque with list * Wrote docs * Multi-historical setup works * WIP * Change so batching only occurs on broker for time-ordered scans Restricted batching to broker for time-ordered queries and adjusted tests Formatting Cleanup * Fixed mistakes in merge * Fixed failing tests * Reset config * Wrote tests and added Javadoc * Nit-change on javadoc * Checkstyle fix * Improved test and appeased TeamCity * Sorry, checkstyle * Applied Jon's recommended changes * Checkstyle fix * Optimization * Fixed tests * Updated error message * Added error message for UOE * Renaming * Finish rename * Smarter limiting for pQueue method * Optimized n-way merge strategy * Rename segment limit -> segment partitions limit * Added a bit of docs * More comments * Fix checkstyle and test * Nit comment * Fixed failing tests -> allow usage of all types of segment spec * Fixed failing tests -> allow usage of all types of segment spec * Revert "Fixed failing tests -> allow usage of all types of segment spec" This reverts commit `ec470288c7`. * Revert "Merge branch '6088-Time-Ordering-On-Scans-N-Way-Merge' of github.com:justinborromeo/incubator-druid into 6088-Time-Ordering-On-Scans-N-Way-Merge" This reverts commit `57033f36df`, reversing changes made to `8f01d8dd16`. * Check type of segment spec before using for time ordering * Fix bug in numRowsScanned * Fix bug messing up count of rows * Fix docs and flipped boolean in ScanQueryLimitRowIterator * Refactor n-way merge * Added test for n-way merge * Refixed regression * Checkstyle and doc update * Modified sequence limit to accept longs and added test for long limits * doc fix * Implemented Clint's recommendations	2019-03-28 14:37:09 -07:00
Surekha	be318f4de3	Add column type to sys table docs (#7359 ) * Add column type * oops should be used=1	2019-03-27 20:21:57 -07:00
Charles Allen	eeb3dbe79d	Move GCP to a core extension (#6953 ) * Move GCP to a core extension * Don't provide druid-core >.< * Keep AWS and GCP modules separate * Move AWSModule to its own module * Add aws ec2 extension and more modules in more places * Fix bad imports * Fix test jackson module * Include AWS and GCP core in server * Add simple empty method comment * Update version to 15 * One more 0.13.0-->0.15.0 change * Fix multi-binding problem * Grep for s3-extensions and update docs * Update extensions.md	2019-03-27 09:00:43 -07:00
Justin Borromeo	c7fea6ac8f	Added better QueryInterruptedException error message for UnsupportedOperationException (#7248 ) * Added error message for UOE * Updated docs * Doc change * Doc change	2019-03-26 15:20:24 -07:00
Gian Merlino	4ca5fe0f60	SQL: Add PARSE_LONG function. (#7326 ) * SQL: Add PARSE_LONG function. * Fix test.	2019-03-22 15:40:10 -07:00
Vadim Ogievetsky	e4f2dcacf2	Druid console docs (#7300 ) * console docs * fix typo	2019-03-21 00:37:33 -07:00
Justin Borromeo	ff94bd16e6	Fix conflicting information in configuration doc (#7299 ) * Doc fix * Fix typo	2019-03-19 14:55:58 -07:00
Qi Shu	5406aaa49d	Add SQL auto complete in druid console (#7244 ) * Add SQL auto complete in druid console * Add comment in sql.md to alert user to change create-sql-function-doc if sql.md format gets changed	2019-03-16 01:45:53 -07:00
Jihoon Son	892d1d35d6	Deprecate NoneShardSpec and drop support for automatic segment merge (#6883 ) * Deprecate noneShardSpec * clean up noneShardSpec constructor * revert unnecessary change * Deprecate mergeTask * add more doc * remove convert from indexMerger * Remove mergeTask * remove HadoopDruidConverterConfig * fix build * fix build * fix teamcity * fix teamcity * fix ServerModule * fix compilation * fix compilation	2019-03-15 23:29:25 -07:00
Atul Mohan	2daeb50008	Add support for optional client authentication on TLS (#7250 ) * Add optional client auth * Add docs	2019-03-15 15:14:34 -07:00
Hongze Zhang	f9d99b245b	Add missing doc link for operations/http-compression.html; Fix magic numbers in test cases using JettyServerInitUtils.wrapWithDefaultGzipHandler (#7110 )	2019-03-13 14:09:19 -07:00
Clint Wylie	3895914aa2	consolidate CompressionUtils.java since now in the same jar (#6908 )	2019-03-13 11:02:44 -04:00
Gian Merlino	9178793ab5	Further improve caching documentation. (#7236 ) Follow-up to #7223 that fixes a doc bug (a result-level cache property was misspelled), changes the recommended "small cluster" threshold from 20 to 5 servers, and clarifies behavior of the various caching options.	2019-03-11 17:57:00 -07:00
Pierre-Emile Ferron	a88fbcd5db	Improve caching doc (#7223 ) - Set correct default values for query context result cache parameters - Add details about broker cache impact on local historical merging	2019-03-11 20:06:28 -04:00
Venkatraman P	3118160387	Adding a tutorial in doc for using Kerberized Hadoop as deep storage. (#6863 ) * Adding a tutorial in doc for using Kerberized Hadoop as deep storage. * Update tutorial-kerberos-hadoop.md * Update tutorial-kerberos-hadoop.md * Update tutorial-kerberos-hadoop.md * Update tutorial-kerberos-hadoop.md * Update tutorial-kerberos-hadoop.md * Update tutorial-kerberos-hadoop.md * Update tutorial-kerberos-hadoop.md * Update tutorial-kerberos-hadoop.md * Update tutorial-kerberos-hadoop.md * Update tutorial-kerberos-hadoop.md * Update tutorial-kerberos-hadoop.md * Update tutorial-kerberos-hadoop.md * Update tutorial-kerberos-hadoop.md * Update tutorial-kerberos-hadoop.md Fixed - to ~ in Apache License section. * Update tutorial-kerberos-hadoop.md * Update tutorial-kerberos-hadoop.md	2019-03-11 11:39:15 -07:00
Jonathan Wei	e1d8c17746	Add commit ID milestone helper script (#7100 ) * Add commit ID milestone helper script * Filter on merged/closed in API call	2019-03-11 11:36:07 -07:00
Jonathan Wei	94463b5778	Add missing redirects and fix broken links (#7213 ) * Add missing redirects * Fix zookeeper redirect * Fix broken links	2019-03-09 15:16:23 -08:00
jorbay-au	62f0de9b89	Remove outdated instruction for rule updates (#7205 )	2019-03-08 16:42:08 -08:00
Clint Wylie	a44df6522c	rename maintenance mode to decommission (#7154 ) * rename maintenance mode to decommission * review changes * missed one * fix straggler, add doc about decommissioning stalling if no active servers * fix missed typo, docs * refine docs * doc changes, replace generals * add explicit comment to mention suppressed stats for balanceTier * rename decommissioningVelocity to decommissioningMaxSegmentsToMovePercent and update docs * fix precondition check * decommissioningMaxPercentOfMaxSegmentsToMove * fix test * fix test * fixes	2019-03-08 16:33:51 -08:00
Jihoon Son	e48a9c138e	Reduce default max # of subTasks to 1 for native parallel task (#7181 ) * Reduce # of max subTasks to 2 * fix typo and add more doc * add more doc and link * change default and add warning * fix doc * add test * fix it test	2019-03-05 22:06:36 -08:00
Jonathan Wei	9183e32876	Add more approximate algorithm docs (#7195 )	2019-03-05 16:44:02 -08:00
Xue Yu	65118277a3	support sin cos etc trigonometric function in sql (#7182 ) * support triangle function in sql * feedback address	2019-03-04 19:18:22 -08:00
Jonathan Wei	5486c2abf8	Update LICENSE and NOTICE files (#7026 ) * Update LICENSE and NOTICE files * Update react-table version	2019-03-04 18:45:22 -08:00
Roman Leventov	10c9f6d708	Fix and document concurrency of EventReceiverFirehose and TimedShutoffFirehose; Refine concurrency specification of Firehose (#7038 ) #### `EventReceiverFirehoseFactory` Fixed several concurrency bugs in `EventReceiverFirehoseFactory`: - Race condition over putting an entry into `producerSequences` in `checkProducerSequence()`. - `Stopwatch` used to measure time across threads, but it's a non-thread-safe class. - Use `System.nanoTime()` instead of `System.currentTimeMillis()` because the latter are [not suitable](https://stackoverflow.com/a/351571/648955) for measuring time intervals. - `close()` was not synchronized by could be called from multiple threads concurrently. Removed unnecessary `readLock` (protecting `hasMore()` and `nextRow()` which are always called from a single thread). Removed unnecessary `volatile` modifiers. Documented threading model and concurrent control flow of `EventReceiverFirehose` instances. Important: please read the updated Javadoc for `EventReceiverFirehose.addAll()`. It allows events from different requests (batches) to be interleaved in the buffer. Is this OK? #### `TimedShutoffFirehoseFactory` - Fixed a race condition that was possible because `close()` that was not properly synchronized. Documented threading model and concurrent control flow of `TimedShutoffFirehose` instances. #### `Firehose` Refined concurrency contract of `Firehose` based on `EventReceiverFirehose` implementation. Importantly, now it states that `close()` doesn't affect `hasMore()` and `nextRow()` and could be called concurrently with them. In other words, specified that `close()` is for "row supply" side rather than "row consume" side. However, I didn't check that other `Firehose` implementatations adhere to this contract. <hr> This issue is the result of reviewing `EventReceiverFirehose` and `TimedShutoffFirehose` using [this checklist](https://medium.com/@leventov/code-review-checklist-java-concurrency-49398c326154).	2019-03-04 18:50:03 -03:00
Jihoon Son	ded03d9d4c	Improve doc for auto compaction (#7117 ) * Improve doc for auto compaction * fix doc * address comments	2019-03-02 12:21:50 -08:00
Jihoon Son	45f12de9ad	Fix supported file formats for Hadoop vs Native batch doc (#7069 ) * Fix supported file formats * address comment	2019-02-28 19:44:45 -08:00
Jonathan Wei	32c418fdd8	Reword 'node' to 'process' (#7172 )	2019-02-28 18:10:39 -08:00
Jonathan Wei	a0afd7931d	Add web consoles doc page (#7123 ) * Add web consoles doc page * PR comments * Remove 'unified' * PR comments * Fix TOC * PR comments * More revisions * GUI -> UI * Update router docs * Reword router doc	2019-02-28 14:02:39 -08:00
Jonathan Wei	0b4f771062	Exclude hadoop-lzo from thrift-extensions build (#7151 )	2019-02-27 19:57:53 -08:00
Jonathan Wei	3d247498ef	Update tutorials for 0.14.0-incubating (#7157 )	2019-02-27 19:50:31 -08:00
Jihoon Son	6b232d8195	Improve compaction tutorial to demonstrate compaction with keepSegmentGranularity = true (#7079 ) * Improve compaction tutorial to demonstrate compaction with keepSegmentGranularity = true * typo * add a warning	2019-02-27 16:02:51 -08:00
Jihoon Son	4e2b085201	Remove DataSegmentFinder, InsertSegmentToDb, and descriptor.json file in deep storage (#6911 ) * Remove DataSegmentFinder, InsertSegmentToDb, and descriptor.json file * delete descriptor.file when killing segments * fix test * Add doc for ha * improve warning	2019-02-20 15:10:29 -08:00
Mingming Qiu	dd34691004	Coordinator await initialization before finishing startup (#6847 ) * Curator server inventory await initialization * address comments * print exception object in log * remove throws ISE * cachingCost awaitInitialization default to false	2019-02-20 11:56:23 -08:00
David Glasser	a81b1b8c9c	index_parallel: support !appendToExisting with no explicit intervals (#7046 ) * index_parallel: support !appendToExisting with no explicit intervals This enables ParallelIndexSupervisorTask to dynamically request locks at runtime if it is run without explicit intervals in the granularity spec and with appendToExisting set to false. Previously, it behaved as if appendToExisting was set to true, which was undocumented and inconsistent with IndexTask and Hadoop indexing. Also, when ParallelIndexSupervisorTask allocates segments in the explicit interval case, fail if its locks on the interval have been revoked. Also make a few other additions/clarifications to native ingestion docs. Fixes #6989. * Review feedback. PR description on GitHub updated to match. * Make native batch ingestion partitions start at 0 * Fix to previous commit * Unit test. Verified to fail without the other commits on this branch. * Another round of review * Slightly scarier warning	2019-02-20 10:54:26 -08:00
Surekha	2b04e6d0bc	add note on consistency of results for sys.segments queries (#7034 ) * add doc * change docs * PR comments * few more changes	2019-02-19 10:52:37 -08:00
Clint Wylie	cadb6c5280	Missing Overlord and MiddleManager api docs (#7042 ) * document middle manager api * re-arrange * correction * document more missing overlord api calls, minor re-arrange of some code i was referencing * fix it * this will fix it * fixup * link to other docs	2019-02-19 10:52:05 -08:00
Surekha	80a2ef7be4	Support kafka transactional topics (#5404 ) (#6496 ) * Support kafka transactional topics * update kafka to version 2.0.0 * Remove the skipOffsetGaps option since it's not used anymore * Adjust kafka consumer to use transactional semantics * Update tests * Remove unused import from test * Fix compilation * Invoke transaction api to fix a unit test * temporary modification of travis.yml for debugging * another attempt to get travis tasklogs * update kafka to 2.0.1 at all places * Remove druid-kafka-eight dependency from integration-tests, remove the kafka firehose test and deprecate kafka-eight classes * Add deprecated in docs for kafka-eight and kafka-simple extensions * Remove skipOffsetGaps and code changes for transaction support * Fix indentation * remove skipOffsetGaps from kinesis * Add transaction api to KafkaRecordSupplierTest * Fix indent * Fix test * update kafka version to 2.1.0	2019-02-18 11:50:08 -08:00
scrawfor	0fa9000849	Add Postgresql SqlFirehose (#6813 ) * Add Postgresql SqlFirehose * Fix Code Style. * Fix style. * Fix Import Order. * Add Line Break before package.	2019-02-14 22:52:03 -08:00
awelsh93	ee91e27fe7	Update api-reference.md doc (#7065 ) - moving description of coordinator isLeader endpoint	2019-02-14 14:38:09 +00:00
Edward Gan	90c1a54b86	Moments Sketch custom aggregator (#6581 ) * Moments Sketch Integration with Druid * updates, add documentation, fix warnings * nits * disallowed base64 * update to druid 0.14	2019-02-13 14:03:47 -08:00
Jihoon Son	970308463d	Add doc for Hadoop-based ingestion vs Native batch ingestion (#7044 ) * Add doc for Hadoop-based ingestion vs Native batch ingestion * add links * add links	2019-02-13 11:23:08 -08:00
Jihoon Son	b1c4a5de0d	Fix and improve doc for partitioning of local index (#7064 )	2019-02-13 11:20:52 -08:00
Jihoon Son	d42de574d6	Add an api to get all lookup specs (#7025 ) * Add an api to get all lookup specs * add doc	2019-02-08 11:05:59 -08:00
Jihoon Son	8e3a58f723	Improve druid.storage.sse.kms.keyId and druid.s3.protocol (#7012 ) * Improve druid.storage.sse.kms.keyId and druid.s3.protocol * fix article	2019-02-06 15:00:51 -08:00
Jihoon Son	75c70c2ccc	Add doc for S3 permissions settings (#7011 ) * Add doc for S3 permissions settings * add a comment about additional settings	2019-02-05 11:52:09 -08:00
Egor Riashin	97b6407983	maintenance mode for Historical (#6349 ) * maintenance mode for Historical forbidden api fix, config deserialization fix logging fix, unit tests * addressed comments * addressed comments * a style fix * addressed comments * a unit-test fix due to recent code-refactoring * docs & refactoring * addressed comments * addressed a LoadRule drop flaw * post merge cleaning up	2019-02-04 18:11:00 -08:00
Jonathan Wei	953b96d0a4	Add more sketch aggregator support in Druid SQL (#6951 ) * Add more sketch aggregator support in Druid SQL * Add docs * Tweak module serde register * Fix tests * Checkstyle * Test fix * PR comment * PR comment * PR comments	2019-02-02 22:34:53 -08:00
Surekha	7baa33049c	Introduce published segment cache in broker (#6901 ) * Add published segment cache in broker * Change the DataSegment interner so it's not based on DataSEgment's equals only and size is preserved if set * Added a trueEquals to DataSegment class * Use separate interner for realtime and historical segments * Remove trueEquals as it's not used anymore, change log message * PR comments * PR comments * Fix tests * PR comments * Few more modification to * change the coordinator api * removeall segments at once from MetadataSegmentView in order to serve a more consistent view of published segments * Change the poll behaviour to avoid multiple poll execution at same time * minor changes * PR comments * PR comments * Make the segment cache in broker off by default * Added a config to PlannerConfig * Moved MetadataSegmentView to sql module * Add doc for new planner config * Update documentation * PR comments * some more changes * PR comments * fix test * remove unintentional change, whether to synchronize on lifecycleLock is still in discussion in PR * minor changes * some changes to initialization * use pollPeriodInMS * Add boolean cachePopulated to check if first poll succeeds * Remove poll from start() * take the log message out of condition in stop()	2019-02-02 22:27:13 -08:00
Justin Borromeo	6430ef8e1b	lol (#6985 )	2019-02-01 14:21:13 -08:00
Clint Wylie	7a5827e12e	bloom filter sql aggregator (#6950 ) * adds sql aggregator for bloom filter, adds complex value serde for sql results * fix tests * checkstyle * fix copy-paste	2019-02-01 13:54:46 -08:00
lxqfy	e45f9ea5e9	Update metrics.md (#6976 )	2019-02-01 13:40:44 -08:00
jorbay-au	852fe86ea2	Remove repeated word in indexing-service.md (#6983 )	2019-02-01 13:38:22 -08:00
Furkan KAMACI	185a7d4fc5	Updated definition and added link for Zookeeper connection string. (#6961 ) * Updated definition and added link for Zookeeper connection string. * Conflicts are merged.	2019-01-31 10:14:42 -08:00
Gian Merlino	54735a5ad1	Kafka indexing: Remove experimental notice. (#6970 )	2019-01-31 09:54:22 -08:00
Surekha	4c211ab2b4	update sys table docs (#6955 ) * update sys table docs * Capitalize SQL	2019-01-31 08:51:39 -08:00
Jonathan Wei	82137874ea	Add master/data/query server concepts to docs/packaging (#6916 ) * Add master/data/query server concepts to docs/packaging * PR comments * TOC and markdown fix * Update image legend * PR comment * More PR comments	2019-01-30 19:41:07 -08:00
Jihoon Son	d4fbbb8deb	Support protocol configuration for S3 (#6954 ) * Support protocol configuration for S3 * Add doc	2019-01-30 19:32:00 -08:00
Gian Merlino	edee576a7a	Add doc for druid.storage.useS3aSchema. (#6964 )	2019-01-30 10:26:37 -08:00
Clint Wylie	a6d81c0d16	Adds bloom filter aggregator to 'druid-bloom-filters' extension (#6397 ) * blooming aggs * partially address review * fix docs * minor test refactor after rebase * use copied bloomkfilter * add ByteBuffer methods to BloomKFilter to allow agg to use in place, simplify some things, more tests * add methods to BloomKFilter to get number of set bits, use in comparator, fixes * more docs * fix * fix style * simplify bloomfilter bytebuffer merge, change methods to allow passing buffer offsets * oof, more fixes * more sane docs example * fix it * do the right thing in the right place * formatting * fix * avoid conflict * typo fixes, faster comparator, docs for comparator behavior * unused imports * use buffer comparator instead of deserializing * striped readwrite lock for buffer agg, null handling comparator, other review changes * style fixes * style * remove sync for now * oops * consistency * inspect runtime shape of selector instead of selector plus, static comparator, add inner exception on serde exception * CardinalityBufferAggregator inspect selectors instead of selectorPluses * fix style * refactor away from using ColumnSelectorPlus and ColumnSelectorStrategyFactory to instead use specialized aggregators for each supported column type, other review comments * adjustment * fix teamcity error? * rename nil aggs to empty, change empty agg constructor signature, add comments * use stringutils base64 stuff to be chill with master * add aggregate combiner, comment	2019-01-29 20:05:17 +07:00
Justin Borromeo	8d70ba69cf	Fix broken link on select query doc page (#6933 ) * Fixed broken link * Typo fix	2019-01-28 17:02:21 -08:00
Clint Wylie	af3cbc3687	add bloom filter druid expression (#6904 ) * add "bloom_filter_test" druid expression to support bloom filters in ExpressionVirtualColumn and ExpressionDimFilter and sql expressions * more docs * use java.util.Base64, doc fixes	2019-01-28 08:41:45 -05:00
Navin Kumar	ae4dba7785	Fix Configuration options (#6884 ) Change `druid.metadata.postgres.` to `druid.metadata.postgres.ssl.`	2019-01-27 12:35:27 -08:00
Gian Merlino	7c5a06bb85	More docs on data modeling. (#6899 ) * More docs on data modeling. * Try to fix formatting. * Fix indentation. * More details and adjustments after feedback.	2019-01-27 11:33:21 -08:00
Janek Lasocki-Biczysko	89f2475369	Move ingest/kafka/* metrics into a separate section on the metrics docs (#6895 ) The `ingest/kafka/*` metrics were grouped together with metrics relevant to RealtimeMetricsMonitor, whereas they should be in their own section.	2019-01-28 00:11:53 +08:00
Jihoon Son	3b020fd81b	Improve doc for auto compaction (#6782 ) * Improve doc for auto compaction * address comments * address comments * address comments	2019-01-23 16:21:45 -08:00
Justin Borromeo	86e171a234	Doc change and commands tested command on v5 and v8 (#6886 )	2019-01-18 15:13:11 -08:00
Jonathan Wei	68f744ec0a	Fixed buckets histogram aggregator (#6638 ) * Fixed buckets histogram aggregator * PR comments * More PR comments * Checkstyle * TeamCity * More TeamCity * PR comment * PR comment * Fix doc formatting	2019-01-17 14:51:16 -08:00
lxqfy	f6dcd63084	Fixed the format of broker client configration (#6878 )	2019-01-16 22:57:50 -08:00
Dayue Gao	5b8a221713	Add SQL id, request logs, and metrics (#6302 ) * use SqlLifecyle to manage sql execution, add sqlId * add sql request logger * fix UT * rename sqlId to sqlQueryId, sql/time to sqlQuery/time, etc * add docs and more sql request logger impls * add UT for http and jdbc * fix forbidden use of com.google.common.base.Charsets * fix UT in QuantileSqlAggregatorTest, supressed unused warning of getSqlQueryId * do not use default method in QueryMetrics interface * capitalize 'sql' everywhere in the non-property parts of the docs * use RequestLogger interface to log sql query * minor bugfixes and add switching request logger * add filePattern configs for FileRequestLogger * address review comments, adjust sql request log format * fix inspection error * try SuppressWarnings("RedundantThrows") to fix inspection error on ComposingRequestLoggerProvider	2019-01-15 23:12:59 -08:00
Jonathan Wei	9a8bade2fb	Update approximate aggregators docs (#6848 )	2019-01-11 21:50:51 -08:00
Furkan KAMACI	55927bf8e3	Kafka version is updated (#6835 ) Update Kafka version in tutorial from 0.10.2.0 to 0.10.2.2	2019-01-10 17:58:40 -08:00
Jihoon Son	c35a39d70b	Add support maxRowsPerSegment for auto compaction (#6780 ) * Add support maxRowsPerSegment for auto compaction * fix build * fix build * fix teamcity * add test * fix test * address comment	2019-01-10 09:50:14 -08:00
Furkan KAMACI	ea973fee6b	Tranquility version is updated (#6824 )	2019-01-10 09:46:58 +08:00
dongyifeng	def823124c	add version comparator for StringComparator (#6745 ) * add version comparator for StringComparator * add more test case and docs	2019-01-08 17:17:03 -08:00
Benjamin Hopp	ef80c4e036	Update sql.md (#6821 ) Corrected defaults for druid.sql.avatica.maxStatementsPerConnection and druid.sql.avatica.maxConnections	2019-01-08 10:15:12 -08:00
Janek Lasocki-Biczysko	b88e6304c4	Fix broken link in ingestion/schema-design.md docs (#6810 )	2019-01-06 18:20:53 -08:00
David Glasser	c08f391605	statsd-emitter: support constant DogStatsD tags (#6791 ) PR #6605 added support to the statsd emitter for DogStatsD tags. This commit lets you specify "constant tags" in the config file which are included with every event. This is helpful if you are running in an environment where you cannot configure your datadog-agent with tags like "cluster name" --- eg, a Kubernetes cluster with a datadog-agent on each node and different Druid deployments in different namespaces but sharing the same datadog-agent daemonset. Also fix the name of an existing boolean getter to start with 'is'.	2019-01-04 15:35:37 +08:00
thomask	0e04acca43	Show how to include classpath in command (#6802 ) Would have saved me some time	2019-01-03 18:31:55 -08:00
Jihoon Son	9ad6a733a5	Add support segmentGranularity for CompactionTask (#6758 ) * Add support segmentGranularity * add doc and fix combination of options * improve doc	2019-01-03 17:50:45 -08:00
Mingming Qiu	6761663509	make kafka poll timeout can be configured (#6773 ) * make kafka poll timeout can be configured * add doc * rename DEFAULT_POLL_TIMEOUT to DEFAULT_POLL_TIMEOUT_MILLIS	2019-01-03 12:16:02 +08:00
Mingming Qiu	114a9fc38f	change propertyBase in ServerViewModule (#6774 )	2019-01-02 16:44:02 +08:00
Clint Wylie	67f832957b	add bloom filter operator to general sql docs (#6785 )	2018-12-31 11:30:33 -08:00
Joshua Sun	7c7997e8a1	Add Kinesis Indexing Service to core Druid (#6431 ) * created seekablestream classes * created seekablestreamsupervisor class * first attempt to integrate kafa indexing service to use SeekableStream * seekablestream bug fixes * kafkarecordsupplier * integrated kafka indexing service with seekablestream * implemented resume/suspend and refactored some package names * moved kinesis indexing service into core druid extensions * merged some changes from kafka supervisor race condition * integrated kinesis-indexing-service with seekablestream * unite tests for kinesis-indexing-service * various bug fixes for kinesis-indexing-service * refactored kinesisindexingtask * finished up more kinesis unit tests * more bug fixes for kinesis-indexing-service * finsihed refactoring kinesis unit tests * removed KinesisParititons and KafkaPartitions to use SeekableStreamPartitions * kinesis-indexing-service code cleanup and docs * merge #6291 merge #6337 merge #6383 * added more docs and reordered methods * fixd kinesis tests after merging master and added docs in seekablestream * fix various things from pr comment * improve recordsupplier and add unit tests * migrated to aws-java-sdk-kinesis * merge changes from master * fix pom files and forbiddenapi checks * checkpoint JavaType bug fix * fix pom and stuff * disable checkpointing in kinesis * fix kinesis sequence number null in closed shard * merge changes from master * fixes for kinesis tasks * capitalized <partitionType, sequenceType> * removed abstract class loggers * conform to guava api restrictions * add docker for travis other modules test * address comments * improve RecordSupplier to supply records in batch * fix strict compile issue * add test scope for localstack dependency * kinesis indexing task refactoring * comments * github comments * minor fix * removed unneeded readme * fix deserialization bug * fix various bugs * KinesisRecordSupplier unable to catch up to earliest position in stream bug fix * minor changes to kinesis * implement deaggregate for kinesis * Merge remote-tracking branch 'upstream/master' into seekablestream * fix kinesis offset discrepancy with kafka * kinesis record supplier disable getPosition * pr comments * mock for kinesis tests and remove docker dependency for unit tests * PR comments * avg lag in kafkasupervisor #6587 * refacotred SequenceMetadata in taskRunners * small fix * more small fix * recordsupplier resource leak * revert .travis.yml formatting * fix style * kinesis docs * doc part2 * more docs * comments * comments2 revert string replace changes * comments * teamcity * comments part 1 * comments part 2 * comments part 3 * merge #6754 * fix injection binding * comments * KinesisRegion refactor * comments part idk lol * can't think of a commit msg anymore * remove possiblyResetDataSourceMetadata() for IncrementalPublishingTaskRunner * commmmmmmmmmments * extra error handling in KinesisRecordSupplier getRecords * comments * quickfix * typo * oof	2018-12-21 12:49:24 -07:00
Gian Merlino	7a09cde4de	Broker: Await initialization before finishing startup. (#6742 ) * Broker: Await initialization before finishing startup. In particular, hold off on announcing the service and starting the HTTP server until the server view and SQL metadata cache are finished initializing. This closes a window of time where a Broker could return partial results shortly after startup. As part of this, some simplification of server-lifecycle service announcements. This helps ensure that the two different kinds of announcements we do (legacy and new-style) stay in sync. * Remove unused imports. * Fix NPE in ServerRunnable.	2018-12-18 20:32:31 -08:00
Jihoon Son	2c380e3a26	Fix doc for automatic compaction (#6749 )	2018-12-17 11:44:33 -08:00
Jonathan Wei	c713116a75	Use @Coordinator leader client in CoordinatorRuleManager (#6729 )	2018-12-16 15:18:09 -08:00
Gian Merlino	04e7c7fbdc	FilteredRequestLogger: Fix start/stop, invalid delegate behavior. (#6637 ) * FilteredRequestLogger: Fix start/stop, invalid delegate behavior. Fixes two bugs: 1) FilteredRequestLogger did not start/stop the delegate. 2) FilteredRequestLogger would ignore an invalid delegate type, and instead silently substitute the "noop" logger. This was due to a larger problem with RequestLoggerProvider setup in general; the fix here is to remove "defaultImpl" from the RequestLoggerProvider interface, and instead have JsonConfigurator be responsible for creating the default implementations. It is stricter about things than the old system was, and is only willing to make a noop logger if it doesn't see any request logger configs. Otherwise, it'll raise a provision error. * Remove unneeded annotations.	2018-12-14 16:55:44 +08:00
Clint Wylie	4ec068642d	move parquet extension input formats up a level to `org.apache.druid.data.input.parquet.DruidParquetInputFormat` for `parquet` and `org.apache.druid.data.input.parquet.DruidParquetAvroInputFormat` for `parquet-avro` (#6727 )	2018-12-13 16:33:42 -08:00
David Lim	f7bbee2e65	Front Matter header needs to be on the first line for md to be rendered properly by jekyll (#6733 )	2018-12-13 11:47:20 -08:00
Vadim Ogievetsky	da4836f38c	Added titles and harmonized docs to improve usability and SEO (#6731 ) * added titles and harmonized docs * manually fixed some titles	2018-12-12 20:42:12 -08:00
Clint Wylie	55914687bb	Fix broken link in docs toc (#6728 ) Change 'peon.html' to the correct link, 'peons.html'. No redirect is needed because the file has always been 'peons', just an incorrect link was introduced in the toc here https://github.com/apache/incubator-druid/pull/6259/files#diff-45297643736c5fb6da0e92f2c3df5d68R89	2018-12-12 15:14:38 -08:00
Vincent Newkirk	cc44a4a28f	Correct Documentation for lowerStrict/upperStrict (#6707 ) The documentation for Bound filter's lowerStrict/upperStrict is incorrect. It is not consistent with the examples provided and actual behaviour of the bound filter. Correct this.	2018-12-06 10:14:50 -08:00
Mingming Qiu	607339003b	Add TaskCountStatsMonitor to monitor task count stats (#6657 ) * Add TaskCountStatsMonitor to monitor task count stats * address comments * add file header * tweak test	2018-12-04 13:37:17 -08:00
Clint Wylie	a1c9d0add2	autosize processing buffers based on direct memory sizing by default (#6588 ) * autosize processing buffers based on direct memory sizing * remove oops, more test * max 1gb autosize buffers, test, start of docs * fix oops * revert accidental change * print buffer size in exception * change the things	2018-12-03 18:40:02 -07:00
David Lim	e2bedab665	fix links to use relative references (#6696 )	2018-11-30 16:32:10 -08:00
David Lim	b332021c49	remove extensions from default configs that have configuration/library dependencies and update docs (#6694 )	2018-11-30 12:52:46 -08:00
rcgarcia74	9bf835b84f	remove #658 doc reference for Schema-less design (#6693 )	2018-11-30 12:53:57 -07:00
Jihoon Son	d6539abd0a	Fix overlord api and console (#6686 ) * Fix overlord APIs and console * remove getRunningTasksByDataSource * add missing path to isApplicable	2018-11-29 23:45:28 -08:00
Mingming Qiu	c5405bb592	emit maxLag/avgLag in KafkaSupervisor (#6587 ) * emit maxLag/totalLag/avgLag in KafkaSupervisor * modify ingest/kafka/totalLag to ingest/kafka/lag for backwards compatibility	2018-11-28 02:11:14 -08:00
Mingming Qiu	849ba867b2	fix missing property in JsonTypeInfo of SegmentWriteOutMediumFactory (#6656 )	2018-11-27 15:59:58 -08:00
Clint Wylie	efdec50847	bloom filter sql (#6502 ) * bloom filter sql support * docs * style fix * style fixes after rebase * use copied/patched bloomkfilter * remove context literal lookup function, changes from review * fix build * rename LookupOperatorConversion to QueryLookupOperatorConversion * remove doc * revert unintended change * add internal exception to bloom filter deserialization exception	2018-11-27 14:11:18 +08:00
Evans Hauser	03df481c9c	Docs: Fix wikipedia links in Ingestion:Rollup (#6659 ) The rendered site doesn't have automatic link detection, so we need to add these links in explicitly. This also fixes the Measure link, which included an extra `)` http://druid.io/docs/latest/ingestion/index.html#rollup	2018-11-23 16:28:05 -08:00
seoeun	22a5bf97a2	Fix issue that tasks tables in metadata storage are not cleared (#6592 ) * tasks tables in metadata storage are not cleared * address comments. remove tasklogs and revert obsolete changes * address comments. change comment and update doc. * address comments. update doc more detailed * address comments. remove redundant log and update doc more detailed. * address comments. update document	2018-11-22 11:50:31 +08:00
Jonathan Wei	e285b1103d	Use PasswordProvider for basic HTTP escalator (#6650 )	2018-11-21 07:34:15 -08:00
Caroline1000	a438a9b99c	fix typo in config page of docs (#6645 )	2018-11-19 16:32:58 -08:00
Deiwin Sarjas	e0d1dc5846	Support DogStatsD style tags in statsd-emitter (#6605 ) * Replace StatsD client library The [Datadog package][1] is a StatsD compatible drop-in replacement for the client library, but it seems to be [better maintained][2] and has support for Datadog DogStatsD specific features, which will be made use of in a subsequent commit. The `count`, `time`, and `gauge` methods are actually exactly compatible with the previous library and the modifications shouldn't be required, but EasyMock seems to have a hard time dealing with the variable arguments added by the DogStatsD library and causes tests to fail if no arguments are provided for the last String vararg. Passing an empty array fixes the test failures. [1]: https://github.com/DataDog/java-dogstatsd-client [2]: https://github.com/tim-group/java-statsd-client/issues/37#issuecomment-248698856 * Retain dimension key information for StatsD metrics This doesn't change behavior, but allows separating dimensions from the metric name in subsequent commits. There is a possible order change for values from `dimsBuilder.build().values()`, but from the tests it looks like it doesn't affect actual behavior and the order of user dimensions is also retained. * Support DogStatsD style tags in statsd-emitter Datadog [doesn't support name-encoded dimensions and uses a concept of _tags_ instead.][1] This change allows Datadog users to send the metrics without having to encode the various dimensions in the metric names. This enables building graphs and monitors with and without aggregation across various dimensions from the same data. As tests in this commit verify, the behavior remains the same for users who don't enable the `druid.emitter.statsd.dogstatsd` configuration flag. [1]: https://www.datadoghq.com/blog/the-power-of-tagged-metrics/#tags-decouple-collection-and-reporting * Disable convertRange behavior for DogStatsD users DogStatsD, unlike regular StatsD, supports floating-point values, so this behavior is unnecessary. It would be possible to still support `convertRange`, even with `dogstatsd` enabled, but that would mean that people using the default mapping would have some of the gauges unnecessarily converted. `time` is in milliseconds and doesn't support floating-point values.	2018-11-19 09:47:57 -08:00
Gian Merlino	7cd457f41c	Kafka: Add warning to doc for earlyMessageRejectionPeriod. (#6644 )	2018-11-18 15:47:38 -07:00
Benjamin Hopp	8a258d3a6a	Fix Hadoop Indexing doc to clarify segmentOutputPath only required for CLI indexer (#6636 ) * Updated hadoop indexing doc to reflect segmentOutputPath is only required when using CLI indexer; otherwise it must be NULL	2018-11-17 12:19:20 +08:00
Niketh Sabbineni	2ebdce20b1	Fix smile query documentation (#6620 )	2018-11-14 08:51:02 +08:00
Jihoon Son	cdae2fe7b5	Deprecate IntervalChunkingQueryRunner (#6591 ) * Deprecate IntervalChunkingQueryRunner * add doc * deprecate metric * fix doc	2018-11-14 06:33:27 +08:00
Gian Merlino	154b6fbcef	SQL: Add "POSITION" function. (#6596 ) Also add a "fromIndex" argument to the strpos expression function. There are some -1 and +1 adjustment terms due to the fact that the strpos expression behaves like Java indexOf (0-indexed), but the POSITION SQL function is 1-indexed.	2018-11-13 13:39:00 -08:00
Jihoon Son	7b262b7123	Remove unnecessary path param from auto compaction api (#6594 ) * Remove unnecessary path param from auto compaction api * fix ci	2018-11-13 09:46:13 -08:00
David Lim	afb239b17a	add missing license headers, in particular to MD files; clean up RAT … (#6563 ) * add missing license headers, in particular to MD files; clean up RAT exclusions * revert inadvertent doc changes * docs * cr changes * fix modified druid-production.svg	2018-11-13 09:38:37 -08:00
Clint Wylie	1224d8b746	overhaul 'druid-parquet-extensions' module, promoting from 'contrib' to 'core' (#6360 ) * move parquet-extensions from contrib to core, adds new hadoop parquet parser that does not convert to avro first and supports flattenSpec and int96 columns, add support for flattenSpec for parquet-avro conversion parser, much test with a bunch of files lifted from spark-sql * fix avro flattener to support nullable primitives for auto discovery and now only supports primitive arrays instead of all arrays * remove leftover print * convert micro timestamp to millis * checkstyle * add ignore for .parquet and .parq to rat exclude * fix legit test failure from avro flattern behavior change * fix rebase * add exclusions to pom to cut down on redundant jars * refactor tests, add support for unwrapping lists for parquet-avro, review comments * more comment * fix oops * tweak parquet-avro list handling * more docs * fix style * grr styles	2018-11-05 21:33:42 -08:00
David Lim	23ad3d214c	fixup docs to download from Apache mirror, fixup tarball name and path, change references from quickstart/* to quickstart/tutorial/* (#6570 )	2018-11-01 21:47:29 -07:00
Caroline1000	26d992840c	correct default tier name (#6568 )	2018-11-01 17:51:13 -07:00
QiuMM	ddd15a6907	correct default value for maxTotalRows (#6566 )	2018-11-01 16:53:15 -07:00
Jihoon Son	a92c2a197b	Move supervisor APIs to api-reference (#6555 ) * Move supervisor APIs to api-reference * fix kafka-specific docs * add ingestion stats report	2018-11-01 13:10:05 -07:00
QiuMM	7b34662462	Period load/drop/broadcast rules should include the future by default (#6414 ) * Period load/drop/broadcast rules should include the future by default * address comments * adjust coordinator console and tweak docs * address comments * fix travis-ci	2018-11-01 09:43:34 -07:00
Jihoon Son	d2a533c7c7	Add doc for missing balancerComputeThreads configuration (#6561 ) * Add doc for missing balancerComputeThreads configuration * remove duplicate	2018-10-31 18:43:12 -07:00
taiii	b1159174b7	Update mysql.md (#6545 )	2018-10-30 14:01:32 -07:00
Jonathan Wei	8382764900	Remove unused bin/init script, conf-quickstart reference (#6520 )	2018-10-26 11:30:01 -07:00
Jonathan Wei	b2d9b6f23d	Allow custom TLS cert checks (#6432 ) * Allow custom TLS cert checks * PR comment * Checkstyle, PR comment	2018-10-24 16:31:52 -07:00
QiuMM	601183b4c7	Add period drop before rule (#6415 ) * Add period drop before rule * add license header * support period drop before rule in coordinator console * address comments	2018-10-24 12:44:30 -07:00
David Lim	822e564f54	include mysql-metadata-storage extension in distribution, but without… (#6497 ) * include mysql-metadata-storage extension in distribution, but without the GPL-licensed connector library * Install mysql connector package * use symlinks to avoid versioning issues * add documentation for fetching the mysql connector	2018-10-20 18:18:58 -07:00
QiuMM	f5f4171a45	QueryCountStatsMonitor: emit query/count (#6473 ) Let `QueryCountStatsMonitor` emit `query/count`, then I can monitor QPS of my services, or I have to count it by myself.	2018-10-19 10:15:02 -03:00
patelh	c780aacc03	Add ability to specify dbcp properties file (#6419 ) * Add ability to specify dbcp properties file * Address PR comments, use mock config, remove setter * Add documentation * APRC, updated docs with example file contents * APRC, add @Nullable, @VisibileForTesting, update doc * APRC, remove error log, use props directly as jackson binding * Remove unused files	2018-10-16 12:27:19 -07:00
QiuMM	85a89e2703	make druid node bind address configurable (#6464 ) * make druid node bind address configurable * fix tests * fix travis-ci	2018-10-15 14:19:40 -07:00
robertervin	95ab1ea737	Fix Empty InDimFilter Failure (#6330 ) * fix empty InDimFilter failure (#6101) * Add test case for empty values input * Add documentation for empty values in InDimFilter	2018-10-14 20:43:16 -07:00
Clint Wylie	84598fba3b	combine druid-api, druid-common, java-util into druid-core (#6443 ) * combine druid-api, druid-common, java-util * spacing	2018-10-14 20:37:37 -07:00
dongyifeng	b06ac54a5e	add PrefixFilteredDimensionSpec for multi-value dimensions (#6307 ) * add PrefixFilteredDimensionSpec for multi-value dimensions * add docs for PrefixFilteredDimensionSpec * remove unnecessary null handling * add null check to the result of NullHandling	2018-10-12 17:51:09 -07:00
vishnu rao	6567fff9e7	Query Response format to be based on http 'accept' header & Query Payload content type to be based on 'content-type' header (#4033 ) * o- Query Response format to be based on http 'accept' header & Query Payload contenty type to be based on 'content-type' header * o- Query Response format to be based on http 'accept' header & Query Payload contenty type to be based on 'content-type' header o- if Accept header is absent, it defaults to Content-Type header * Feature: Query Response format to be based on http 'accept' header & Query Payload content type to be based on 'content-type' PR #4033 Minor change to a comment - restoring to previous wording * Feature: Query Response format to be based on http 'accept' header & Query Payload content type to be based on 'content-type' PR #4033 o- minor change to check for empty string	2018-10-12 14:29:14 -07:00
Atul Mohan	ab7b4798cc	Securing passwords used for SSL connections to Kafka (#6285 ) * Secure credentials in consumer properties * Merge master * Refactor property population into separate method * Fix property setter * Fix tests	2018-10-11 10:03:01 -07:00
QiuMM	f8f4526b16	Add suspend\|resume\|terminate all supervisors endpoints. (#6272 ) * ability to showdown all supervisors * add doc * address comments * fix code style * address comments * change ternary assignment to if statement * better docs	2018-10-10 21:41:59 -07:00
Clint Wylie	f7775d1db3	fixes for LookupReferencesManagerTest (#6444 ) * some fixes for LookupReferencesManagerTest * docs * formatting * more formatting fixes	2018-10-10 18:02:11 -07:00
Surekha	3a0a667fe0	Introduce SystemSchema tables (#5989 ) (#6094 ) * Added SystemSchema with following tables (#5989) * SEGMENTS table provides details on served and published segments * SERVERS table provides details on data servers * SERVERSEGMETS table is the JOIN of SEGMENTS and SERVERS * TASKS table provides details on tasks * Add documentation for system schema * Fix static-analysis warnings * Address PR comments Add unit tests Fix a test * Try to fix a test * Fix a bug around replica count * rename io.druid to org.apache.druid * Major change is to make tasks and segment queries streaming * Made tasks/segments stream to calcite instead of storing it in memory * Add num_rows to segments table * Refactor JsonParserIterator * Replace with closeable iterator * Fix docs, make num_rows column nullable, some unit test changes * make num_rows column type long, allow it to be null fix a compile error after merge, add TrafficCop param to InputStreamResponseHandler * Filter null rows for segments table from Linq4j enumerable * change num_replicas datatype to long in segments table * Fix some tests and address comments * Doc updates, other PR comments * Update tests * Address comments * Add auth check * Update docs * Refactoring * Fix teamcity warning, change the getQueryableServer in TimelineServerView * Fix compilation after rebase * Use the stream API from AuthorizationUtils * Added LeaderClient interface and NoopDruidLeaderClient class * Revert "Added LeaderClient interface and NoopDruidLeaderClient class" This reverts commit `100fa46e39`. * Make the naming consistent to server_segments for the join table * Add ForbiddenException on auth check failure * Remove static block from SystemSchema * Try to fix a test in CalciteQueryTest due to rename of server_segments * Fix the json output format in the coordinator API * Add auth check in the segments API * Add null check to avoid NPE * Use annonymous class object instead of mock for DruidLeaderClient in SqlBenchmark * Fix test failures, type long/BIGINT can be nullable * Revert long nullability to fix tests * Fix style for tests * PR comments * Address PR comments * Add the missing BytesAccumulatingResponseHandler class * Use Sequences.withBaggage in DruidPlanner * Fix docs, add comments * Close the iterator if hasNext returns false	2018-10-10 17:17:29 -07:00
QiuMM	d559dfecb2	replace deprecated druid.port by druid.plaintextPort in docs (#6427 )	2018-10-09 10:57:01 -07:00
Jihoon Son	88d23b77b7	Add support keepSegmentGranularity for automatic compaction (#6407 ) * Add support keepSegmentGranularity for automatic compaction * skip unknown dataSource * ignore single semgnet to compact * add doc * address comments * address comment	2018-10-07 16:48:58 -07:00
Jihoon Son	45aa51a00c	Add support hash partitioning by a subset of dimensions to indexTask (#6326 ) * Add support hash partitioning by a subset of dimensions to indexTask * add doc * fix style * fix test * fix doc * fix build	2018-10-06 16:45:07 -07:00
Roman Leventov	c5872bef41	Improve GC metrics documentation (#6423 )	2018-10-05 14:57:01 -07:00
Gian Merlino	244046fda5	SQL: Fix too-long headers in http responses. (#6411 ) Fixes #6409 by moving column name info from HTTP headers into the result body.	2018-10-01 18:13:08 -07:00
Jihoon Son	cb14a43038	Remove ConvertSegmentTask, HadoopConverterTask, and ConvertSegmentBackwardsCompatibleTask (#6393 ) * Remove ConvertSegmentTask, HadoopConverterTask, and ConvertSegmentBackwardsCompatibleTask * update doc and remove auto conversion * remove remaining doc * fix teamcity	2018-10-01 12:03:35 -07:00
Shiv Toolsidass	a56ffe6ab2	Added backpressure metric to docs and defaultMetricDimensions (#6405 ) * Added backpressure metric to docs and defaultMetricDimensions.json * Reworded description for backpressure metric in docs	2018-09-29 17:57:29 -07:00
adursun	6f44e568db	Add missing comma (#6399 )	2018-09-28 09:02:36 -07:00
QiuMM	47a6cca013	Add TimestampSpec format for microsecond (#6395 )	2018-09-27 09:38:44 -07:00
Jihoon Son	6fb503c073	Deprecate task audit logging (#6368 ) * Deprecate task audit logging * fix test * fix it test	2018-09-26 16:28:02 -07:00
Nishant Bangarwa	c9d281a2e9	Add ability to pass in Bloom filter from Hive Queries (#6222 ) * Bloom filter initial implementation fix checkstyle review comments Fix wierd failure review comments Revert "Fix wierd failure" This reverts commit a13a83ad7887e679f6d539191b52aeaaea85b613. * fix test * review comment	2018-09-26 16:04:26 -07:00

... 6 7 8 9 10 ...

2388 Commits