druid

Commit Graph

Author	SHA1	Message	Date
Maytas Monsereenusorn	42359c93dd	Implement ANY aggregator (#9187 ) * Implement ANY aggregator * Add copyright headers * Add unit tests * fix BufferAggregator * Fix bug in BufferAggregator * hook up the SQL command * add check for buffer aggregator * Address comment * address comments * add docs * Address comments * add more tests for numeric columns that have null values when run in sql compatible null mode * fix checkstyle errors * fix failing tests * fix failing tests	2020-01-16 14:40:32 -08:00
Suneet Saldanha	92ac22d060	Link javaOpts to middlemanager runtime.properties docs (#9101 ) * Link javaOpts to middlemanager runtime.properties docs * fix broken link * reword config links	2020-01-15 21:22:49 -08:00
Suneet Saldanha	85a3d416b0	Tutorials use new ingestion spec where possible (#9155 ) * Tutorials use new ingestion spec where possible There are 2 main changes * Use task type index_parallel instead of index * Remove the use of parser + firehose in favor of inputFormat + inputSource index_parallel is the preferred method starting in 0.17. Setting the job to index_parallel with the default maxNumConcurrentSubTasks(1) is the equivalent of an index task Instead of using a parserSpec, dimensionSpec and timestampSpec have been promoted to the dataSchema. The format is described in the ioConfig as the inputFormat. There are a few cases where the new format is not supported * Hadoop must use firehoses instead of the inputSource and inputFormat * There is no equivalent of a combining firehose as an inputSource * A Combining firehose does not support index_parallel * fix typo	2020-01-15 14:08:29 -08:00
Jonathan Wei	d1500c1328	Update Kinesis resharding information about task failures (#9104 )	2020-01-07 15:44:48 -08:00
Jonathan Wei	58d337186b	Graduation update for ASF release process guide and download links (#9126 ) * Graduation update for ASF release process guide and download links * Fix release vote thread typo * Fix pom.xml	2020-01-06 15:00:33 -06:00
Jonathan Wei	aa539177ec	De-incubation cleanup in code, docs, packaging (#9108 ) * De-incubation cleanup in code, docs, packaging * remove unused docs script	2020-01-03 12:33:19 -05:00
Jihoon Son	3c31493772	Add missing docs for http client configurations (#9054 ) * Add missing docs for http client configurations * fix typo * backticks	2019-12-19 17:41:04 -08:00
Chi Cao Minh	6178f05da6	Fail superbatch range partition multi dim values (#9058 ) * Fail superbatch range partition multi dim values Change the behavior of parallel indexing range partitioning to fail ingestion if any row had multiple values for the partition dimension. After this change, the behavior matches that of hadoop indexing. (Previously, rows with multiple dimension values would be skipped.) * Improve err msg, rename method, rename test class	2019-12-18 10:14:03 -08:00
Clint Wylie	6881535b48	docs - clarify cache parameters (#9020 )	2019-12-13 16:53:45 -08:00
Suneet Saldanha	3325da1718	Allow startup scripts to specify java home (#9021 ) * Allow startup scripts to specify java home The startup scripts now look for java in 3 locations. The order is from most related to druid to least, ie ${DRUID_JAVA_HOME} ${JAVA_HOME} ${PATH} * Update fn names and clean up code * final round of fixes * fix spellcheck	2019-12-12 21:36:00 -08:00
Himanshu	9236dd9467	optionally enable Jetty ForwardedRequestCustomizer (#9010 ) * optionally enable Jetty ForwardedRequestCustomizer * fix doc build	2019-12-12 17:00:08 -08:00
Benjamin Hopp	13c33c1766	Update architecture.md (#9015 )	2019-12-11 19:05:50 -08:00
Jihoon Son	e5e1e9c4ee	Fix broken master (#9005 ) * Multibinding for NodeRole * Fix endpoints * fix doc * fix test	2019-12-11 15:56:36 -08:00
Parag Jain	24fe824055	add readiness endpoints to processes having initialization delays (#8841 )	2019-12-10 17:26:13 -08:00
Chi Cao Minh	3de7ab8523	DataSketches jars in core (#9003 ) Having DataSketches jars in core will allow potential improvements, for example: - Provide an alternative implementation of HLL: https://datasketches.github.io/docs/HLL/HllSketchVsDruidHyperLogLogCollector.html - Range partitioning for native parallel batch indexing without having the user load extensions on the classpath Dev mailing list discussion: https://lists.apache.org/thread.html/301410d71ff799cf616bf17c4ebcf9999fc30829f5fa62909f403e6c%40%3Cdev.druid.apache.org%3E	2019-12-10 14:02:34 -08:00
Chi Cao Minh	bab78fc80e	Parallel indexing single dim partitions (#8925 ) * Parallel indexing single dim partitions Implements single dimension range partitioning for native parallel batch indexing as described in #8769. This initial version requires the druid-datasketches extension to be loaded. The algorithm has 5 phases that are orchestrated by the supervisor in `ParallelIndexSupervisorTask#runRangePartitionMultiPhaseParallel()`. These phases and the main classes involved are described below: 1) In parallel, determine the distribution of dimension values for each input source split. `PartialDimensionDistributionTask` uses `StringSketch` to generate the approximate distribution of dimension values for each input source split. If the rows are ungrouped, `PartialDimensionDistributionTask.UngroupedRowDimensionValueFilter` uses a Bloom filter to skip rows that would be grouped. The final distribution is sent back to the supervisor via `DimensionDistributionReport`. 2) The range partitions are determined. In `ParallelIndexSupervisorTask#determineAllRangePartitions()`, the supervisor uses `StringSketchMerger` to merge the individual `StringSketch`es created in the preceding phase. The merged sketch is then used to create the range partitions. 3) In parallel, generate partial range-partitioned segments. `PartialRangeSegmentGenerateTask` uses the range partitions determined in the preceding phase and `RangePartitionCachingLocalSegmentAllocator` to generate `SingleDimensionShardSpec`s. The partition information is sent back to the supervisor via `GeneratedGenericPartitionsReport`. 4) The partial range segments are grouped. In `ParallelIndexSupervisorTask#groupGenericPartitionLocationsPerPartition()`, the supervisor creates the `PartialGenericSegmentMergeIOConfig`s necessary for the next phase. 5) In parallel, merge partial range-partitioned segments. `PartialGenericSegmentMergeTask` uses `GenericPartitionLocation` to retrieve the partial range-partitioned segments generated earlier and then merges and publishes them. * Fix dependencies & forbidden apis * Fixes for integration test * Address review comments * Fix docs, strict compile, sketch check, rollup check * Fix first shard spec, partition serde, single subtask * Fix first partition check in test * Misc rewording/refactoring to address code review * Fix doc link * Split batch index integration test * Do not run parallel-batch-index twice * Adjust last partition * Split ITParallelIndexTest to reduce runtime * Rename test class * Allow null values in range partitions * Indicate which phase failed * Improve asserts in tests	2019-12-09 23:05:49 -08:00
Vadim Ogievetsky	0330744793	Docs: bold Java 8 requirement (#8996 ) * bold Java 8 req * add warning box	2019-12-09 20:23:07 -08:00
Roman Leventov	1c62987783	Add SelfDiscoveryResource; rename org.apache.druid.discovery.No… (#6702 ) * Add SelfDiscoveryResource * Rename org.apache.druid.discovery.NodeType to NodeRole. Refactor CuratorDruidNodeDiscoveryProvider. Make SelfDiscoveryResource to listen to updates only about a single node (itself). * Extended docs * Fix brace * Remove redundant throws in Lifecycle.Handler.stop() * Import order * Remove unresolvable link * Address comments * tmp * tmp * Rollback docker changes * Remove extra .sh files * Move filter * Fix SecurityResourceFilterTest	2019-12-08 18:47:58 +03:00
Clint Wylie	441515cb50	update dump-segment docs so example command works (#8998 ) * update dump-segment docs so example command works * not everyone uses bash	2019-12-07 06:36:46 -08:00
Jonathan Wei	c949a25210	Add DruidInputSource (replacement for IngestSegmentFirehose) (#8982 ) * Add Druid input source and format * Inherit dims/metrics from segment * Add ingest segment firehose reindexing test * Remove unnecessary module * Fix unit tests, checkstyle * Add doc entry * Fix dimensionExclusions handling, add parallel index integration test * Add spelling exclusion * Address some PR comments * Checkstyle * wip * Address rest of PR comments * Address PR comments	2019-12-05 16:50:00 -08:00
Clint Wylie	5ecdf94d83	add 'prefixes' support to google input source (#8930 ) * add prefixes support to google input source, making it symmetrical-ish with s3 * docs * more better, and tests * unused * formatting * javadoc * dependencies * oops * review comments * better javadoc	2019-12-04 21:01:10 -08:00
Lucas Capistrant	8dd9a8cb15	Small doc fix for baseTaskDir conf (#8978 )	2019-12-04 14:07:03 -08:00
Clint Wylie	a48784a1fd	dropwizard-emitter doc fixes (#8988 )	2019-12-04 12:52:58 -08:00
Fangyuan Deng	187cf0dd3f	[Improvement] historical fast restart by lazy load columns metadata(20X faster) (#6988 ) * historical fast restart by lazy load columns metadata * delete repeated code * add documentation for druid.segmentCache.lazyLoadOnStart * fix unit test fail * fix spellcheck * update docs * update docs mentioning a catch	2019-12-03 09:47:01 -08:00
Jonathan Wei	00ce18a0ea	Additional Kinesis resharding fixes (#8870 ) * Additional Kinesis resharding fixes * Address PR comments * Remove unused method * Adjust SegmentTransactionalInsertAction null handling * Check for unchanged metadata on empty publish * Add logs for empty publish * Fix javadoc * Clear offset when invalid endOffsets are seen * Fix LGTM alert * Fix build * Add resharding note to Kinesis docs * Checkstyle * Spelling * Address PR comments * Checkstyle	2019-11-28 12:59:01 -08:00
Clint Wylie	4458113375	S3 input source (#8903 ) * add s3 input source for native batch ingestion * add docs * fixes * checkstyle * lazy splits * fixes and hella tests * fix it * re-use better iterator * use key * javadoc and checkstyle * exception * oops * refactor to use S3Coords instead of URI * remove unused code, add retrying stream to handle s3 stream * remove unused parameter * update to latest master * use list of objects instead of object * serde test * refactor and such * now with the ability to compile * fix signature and javadocs * fix conflicts yet again, fix S3 uri stuffs * more tests, enforce uri for bucket * javadoc * oops * abstract class instead of interface * null or empty * better error	2019-11-25 22:31:19 -08:00
Jihoon Son	a2e6de4b16	Fix the potential race between SplittableInputSource.getNumSplits() and SplittableInputSource.createSplits() in TaskMonitor (#8924 ) * Fix the potential race SplittableInputSource.getNumSplits() and SplittableInputSource.createSplits() in TaskMonitor * Fix docs and javadoc * Add unit tests for large or small estimated num splits * add override	2019-11-23 01:38:08 -08:00
Clint Wylie	7250010388	add parquet support to native batch (#8883 ) * add parquet support to native batch * cleanup * implement toJson for sampler support * better binaryAsString test * docs * i hate spellcheck * refactor toMap conversion so can be shared through flattenerMaker, default impls should be good enough for orc+avro, fixup for merge with latest * add comment, fix some stuff * adjustments * fix accident * tweaks	2019-11-22 10:49:16 -08:00
SeKing	9955107e8e	RandomLocationSelectorStrategy to Choose an available disk(location) to store a segment. With unit tests. (#8461 )	2019-11-22 03:46:54 -08:00
Surekha	d628bebbd7	Make supervisor API similar to submit task API (#8810 ) * accept spec or dataSchema, tuningConfig, ioConfig while submitting task json * fix test * update docs * lgtm warning * Add original constructor back to IndexTask to minimize changes * fix indentation in docs * Allow spec to be specified in supervisor schema * undo IndexTask spec changes * update docs * Add Nullable and deprecated annotations * remove deprecated configs from SeekableStreamSupervisorSpec * remove nullable annotation	2019-11-20 10:04:41 -08:00
Clint Wylie	d67c3c7aed	document SQL compatible null handling mode (#8894 ) * document SQL compatible null handling mode * adjustments * fix docs * review changes	2019-11-20 06:52:20 -08:00
Clint Wylie	074a45219d	add google cloud storage InputSource for native batch (#8907 ) * add google cloud storage InputSource for native batch * rename * checkstyle * fix * fix spelling * review comments	2019-11-19 19:49:43 -08:00
Chi Cao Minh	8365bdf62a	Address security vulnerabilities (#8878 ) * Address security vulnerabilities Security vulnerabilities addressed by upgrading 3rd party libs: - Upgrade avro-ipc to 1.9.1 - sonatype-2019-0115 - Upgrade caffeine to 2.8.0 - sonatype-2019-0282 - Upgrade commons-beanutils to 1.9.4 - CVE-2014-0114 - Upgrade commons-codec to 1.13 - sonatype-2012-0050 - Upgrade commons-compress to 1.19 - CVE-2019-12402 - sonatype-2018-0293 - Upgrade hadoop-common to 2.8.5 - CVE-2018-11767 - Upgrade hadoop-mapreduce-client-core to 2.8.5 - CVE-2017-3166 - Upgrade hibernate-validator to 5.2.5 - CVE-2017-7536 - Upgrade httpclient to 4.5.10 - sonatype-2017-0359 - Upgrade icu4j to 55.1 - CVE-2014-8147 - Upgrade jackson-databind to 2.6.7.3: - CVE-2017-7525 - Upgrade jetty-http to 9.4.12: - CVE-2017-7657 - CVE-2017-7658 - CVE-2017-7656 - CVE-2018-12545 - Upgrade log4j-core to 2.8.2 - CVE-2017-5645: - Upgrade netty to 3.10.6 - CVE-2015-2156 - Upgrade netty-common to 4.1.42 - CVE-2019-9518 - Upgrade netty-codec-http to 4.1.42 - CVE-2019-16869 - Upgrade nimbus-jose-jwt to 4.41.1 - CVE-2017-12972 - CVE-2017-12974 - Upgrade plexus-utils to 3.0.24 - CVE-2017-1000487 - sonatype-2015-0173 - sonatype-2016-0398 - Upgrade postgresql to 42.2.8 - CVE-2018-10936 Note that if users are using JDBC lookups with postgres, they may need to update the JDBC jar used by the lookup extension. * Fix license for postgresql	2019-11-19 09:14:33 -08:00
Chi Cao Minh	d60978343a	Improve missing JDBC driver error for lookups (#8872 ) If the JDBC drivers are missing from the lookup extensions, throw an exception that directs the user how to resolve the issue. This change is a follow up to #8825.	2019-11-18 11:42:38 -08:00
Jihoon Son	1611792855	Add InputSource and InputFormat interfaces (#8823 ) * Add InputSource and InputFormat interfaces * revert orc dependency * fix dimension exclusions and failing unit tests * fix tests * fix test * fix test * fix firehose and inputSource for parallel indexing task * fix tc * fix tc: remove unused method * Formattable * add needsFormat(); renamed to ObjectSource; pass metricsName for reader * address comments * fix closing resource * fix checkstyle * fix tests * remove verify from csv * Revert "remove verify from csv" This reverts commit `1ea7758489`. * address comments * fix import order and javadoc * flatMap * sampleLine * Add IntermediateRowParsingReader * Address comments * move csv reader test * remove test for verify * adjust comments * Fix InputEntityIteratingReader * rename source -> entity * address comments	2019-11-15 09:22:09 -08:00
Clint Wylie	cc54b2a9df	support for array expressions in TransformSpec with ExpressionTransform (#8744 ) * transformSpec + array expressions changes: * added array expression support to transformSpec * removed ParseSpec.verify since its only use afaict was preventing transform expr that did not replace their input from functioning * hijacked index task test to test changes * remove docs about being unsupported * re-arrange test assert * unused imports * imports * fix tests * preserve types * suppress warning, fixes, add test * formatting * cleanup * better list to array type conversion and tests * fix oops	2019-11-13 11:04:37 -08:00
fst0	80dbf44fca	Add reference to druid.storage.type (#8857 ) * Add reference to `druid.storage.type` This should be in here. Without setting storage type to S3 globally it will obviously not be used, even if all other parameters are correct. * Update s3.md Add global storage parameter to knob table. * Update s3.md	2019-11-13 10:03:41 -08:00
Lucas Capistrant	a066cc5648	Fix groupMapping endpoint URIs in druid-basic-security doc (#8847 )	2019-11-12 21:12:34 +05:30
Jonathan Wei	75ea0d592a	Add more datasketches doubles sketch SQL functions (#8843 ) * Add more datasketches doubles sketch SQL postaggs * style and lgtm	2019-11-08 18:05:06 -08:00
Gian Merlino	0e8c3f74d0	SQL: EARLIEST, LATEST aggregators. (#8815 ) * SQL: EARLIEST, LATEST aggregators. I chose these names instead of FIRST, LAST because those are already reserved functions in Calcite that mean something different. I think these are also better names anyway. * Finalify. * SQL updates. * Adjust aggregator calls. * Validations, test updates. * Review docs.	2019-11-08 16:29:25 -08:00
Clint Wylie	7aafcf8bca	parallel broker merges on fork join pool (#8578 ) * sketch of broker parallel merges done in small batches on fork join pool * fix non-terminating sequences, auto compute parallelism * adjust benches * adjust benchmarks * now hella more faster, fixed dumb * fix * remove comments * log.info for debug * javadoc * safer block for sequence to yielder conversion * refactor LifecycleForkJoinPool into LifecycleForkJoinPoolProvider which wraps a ForkJoinPool * smooth yield rate adjustment, more logs to help tune * cleanup, less logs * error handling, bug fixes, on by default, more parallel, more tests * remove unused var * comments * timeboundary mergeFn * simplify, more javadoc * formatting * pushdown config * use nanos consistently, move logs back to debug level, bit more javadoc * static terminal result batch * javadoc for nullability of createMergeFn * cleanup * oops * fix race, add docs * spelling, remove todo, add unhandled exception log * cleanup, revert unintended change * another unintended change * review stuff * add ParallelMergeCombiningSequenceBenchmark, fixes * hyper-threading is the enemy * fix initial start delay, lol * parallelism computer now balances partition sizes to partition counts using sqrt of sequence count instead of sequence count by 2 * fix those important style issues with the benchmarks code * lazy sequence creation for benchmarks * more benchmark comments * stable sequence generation time * update defaults to use 100ms target time, 4096 batch size, 16384 initial yield, also update user docs * add jmh thread based benchmarks, cleanup some stuff * oops * style * add spread to jmh thread benchmark start range, more comments to benchmarks parameters and purpose * retool benchmark to allow modeling more typical heterogenous heavy workloads * spelling * fix * refactor benchmarks * formatting * docs * add maxThreadStartDelay parameter to threaded benchmark * why does catch need to be on its own line but else doesnt	2019-11-07 11:58:46 -08:00
Jad Naous	ce3c0dae4d	Add note on JDBC libs for lookups (#8825 ) * Add note on JDBC libs for lookups * Fix directory and additional "the"	2019-11-06 13:31:26 -08:00
Himanshu	5adc8212b4	add documentation for druid docker and k8s operator (#8802 ) * add documentation for druid docker and k8s operator * address review comment and add Kubernetes to spelling file	2019-11-06 12:56:21 -08:00
Tijo Thomas	27acdbd2b8	'hadoop fs' command is deprecated . The new approach is to use hdfs command . Replacing 'hadoop fs' command with 'hdfs dfs' (#8762 )	2019-11-01 04:42:10 +05:30
Giuseppe Martino	9c171e2b1f	Message rejection absolute date (#8656 ) * Add option lateMessageRejectionStartDate * Use option lateMessageRejectionStartDate * Fix tests * Add lateMessageRejectionStartDate to kafka indexing service * Update tests kafka indexing service * Fix tests for KafkaSupervisorTest * Add lateMessageRejectionStartDate to KinesisSupervisorIOConfig * Fix var name * Update documentation * Add check lateMessageRejectionStartDateTime and lateMessageRejectionPeriod, fails if both were specified.	2019-10-31 15:13:02 -07:00
Clint Wylie	3ff5e02237	remove select query (#8739 ) * remove select query * thanks teamcity * oops * oops * add back a SelectQuery class that throws RuntimeExceptions linking to docs * adjust text * update docs per review * deprecated	2019-10-30 19:29:56 -07:00
Gian Merlino	7605c23354	Remove Tranquility configs and certain doc references. (#8793 ) Since it hasn't received updates or community interest in a while, it makes sense to de-emphasize it in the distribution and most documentation (outside of simple mentions of its existence).	2019-10-30 16:30:16 -07:00
Gian Merlino	c922d2c3c9	Use bundled ZooKeeper in tutorials. (#8792 )	2019-10-30 16:17:28 -07:00
Gian Merlino	aa81253cf4	Fix typos. (#8767 )	2019-10-28 12:47:01 -07:00
Gian Merlino	b65d2ac648	Add HDFS firehose (#8754 ) * Add HDFS firehose. * Tests, support for lists of paths. * Fixups. * Update list of firehoses. * Wildcards is a word.	2019-10-28 08:07:38 -07:00
Vadim Ogievetsky	f9b94a5db1	Docs: remove self link (#8760 ) This section links to itself in the description. I tried to follow that link and spit hot tea all over my monitor from laughter.	2019-10-27 22:33:22 -07:00
Clint Wylie	09f92818d4	update druid expression docs to indicate that array functions do not work at indexing time (#8734 ) * update druid expression docs to indicate that array functions are not supported in transformSpec * fix unrelated spelling check	2019-10-24 22:04:08 -07:00
Eyal Yurman	14e33428f0	Moving Average extention: Add Sum averagers (#8511 ) * Add sum averagers. * avoid casting double to long.	2019-10-24 16:37:24 -07:00
Vadim Ogievetsky	cc3650ee3b	fix doc headers (#8729 )	2019-10-24 11:17:39 -07:00
Jihoon Son	f5b9bf5525	Cluster-wide configuration for query vectorization (#8657 ) * Cluster-wide configuration for query vectorization * add doc * fix build * fix doc * rename to QueryConfig and add javadoc * fix checkstyle * fix variable names	2019-10-23 21:44:28 +08:00
David Glasser	b453fda251	docs: clarify native batch ingestion w/ overlapping segments (#8720 ) I was confused by a paragraph in the docs that I myself wrote!	2019-10-22 21:01:56 -07:00
Jad Naous	2ab43aa688	Update tutorial-kerberos-hadoop.md (#8689 ) * Update tutorial-kerberos-hadoop.md Fix up what looks like a bad merge. * Update tutorial-kerberos-hadoop.md Fix spelling issues	2019-10-22 14:40:41 -07:00
Abhishek Radhakrishnan	42cfe679f1	Update query result timestamp to match query intervals. (#8717 )	2019-10-22 14:39:47 -07:00
Surekha	e919eccc4b	Update docs to add metadataSegment configs (#8708 ) * Add metadataSegment configs to docs * rearrange in alphabetical order	2019-10-22 01:19:36 -07:00
Kamal Gurala	3ed5f9698a	gcs prefix doc fix (#8699 )	2019-10-21 08:29:54 -07:00
Surekha	98f59ddd7e	Add `sys.supervisors` table to system tables (#8547 ) * Add supervisors table to SystemSchema * Add docs * fix checkstyle * fix test * fix CI * Add comments * Fix javadoc teamcity error * comments * fix links in docs * fix links * rename fullStatus query param to system and remove it from docs	2019-10-18 15:16:42 -07:00
Jonathan Wei	d88075237a	Add initial SQL support for non-expression sketch postaggs (#8487 ) * Add initial SQL support for non-expression sketch postaggs * Checkstyle, spotbugs * checkstyle * imports * Update SQL docs * Checkstyle * Fix theta sketch operator docs * PR comments * Checkstyle fixes * Add missing entries for HLL sketch module * PR comments, add round param to HLL estimate operator, fix optional HLL param	2019-10-18 14:59:44 -07:00
Jihoon Son	30c15900be	Auto compaction based on parallel indexing (#8570 ) * Auto compaction based on parallel indexing * javadoc and doc * typo * update spell * addressing comments * address comments * fix log * fix build * fix test * increase default max input segment bytes per task * fix test	2019-10-18 13:24:14 -07:00
Mingming Qiu	2c758ef5ff	Support assign tasks to run on different categories of MiddleManagers (#7066 ) * Support assign tasks to run on different tiers of MiddleManagers * address comments * address comments * rename tier to category and docs * doc * fix doc * fix spelling errors * docs	2019-10-17 12:57:19 -07:00
Jad Naous	d54d2e1627	Update segments.md (#8693 ) Make bullet numbers clearer with parantheses, fix last reference to 2 being interpreted as a bullet point.	2019-10-17 11:55:23 -07:00
Jad Naous	9f4e11df32	Update tutorial-rollup.md (#8687 ) At this point there hasn't yet been an explanation in the tutorial of what "segments" are	2019-10-16 20:08:09 -06:00
Jonathan Wei	89ce6384f5	More Kinesis resharding adjustments (#8671 ) * More Kinesis resharding adjustments * Fix TC inspection * Fix comment' * Adjust comment, small refactor * Make repartition transition time configurable * Add spellcheck exclusion * Spelling fix	2019-10-15 23:19:17 -07:00
Jihoon Son	4046c86d62	Stateful auto compaction (#8573 ) * Stateful auto compaction * javaodc * add removed test back * fix test * adding indexSpec to compactionState * fix build * add lastCompactionState * address comments * extract CompactionState * fix doc * fix build and test * Add a task context to store compaction state; add javadoc * fix it test	2019-10-15 22:57:42 -07:00
Mitch Lloyd	1a78a0c98a	Add credentials for ECS (#8651 ) * Add credentials for ECS * Fix import order * Update S3 authentication methods table * Update .spelling for new documentation	2019-10-12 09:12:14 -07:00
Abhishek Radhakrishnan	d87840d894	Minor updates to documentation. (#8665 )	2019-10-12 09:11:03 -07:00
Jihoon Son	96d8523ecb	Use hash of Segment IDs instead of a list of explicit segments in auto compaction (#8571 ) * IOConfig for compaction task * add javadoc, doc, unit test * fix webconsole test * add spelling * address comments * fix build and test * address comments	2019-10-09 11:12:00 -07:00
Clint Wylie	8bda3afea4	fix spelling errors triggered by another doc PR (#8653 )	2019-10-08 23:43:58 -07:00
Nishant Bangarwa	0853273091	Add tier based usage metrics for historical nodes to help with autoscaling (#8636 ) * Add tier based usage metrics for historical nodes to help with druid historical autoscaling Add tier based usage metrics for historical nodes to help druid cluster orchestration systems understand the historical node usage and requirements. Following metrics would be helpful - tier/required/capacity- total capacity in bytes required in each tier. Dimensions - tier tier/total/capacity - total capacity in bytes available in a given tier. Dimension - tier tier/historical/count - no. of historical nodes available in each tier. Dimension - tier tier/replication/factor - configured maximum replication factor in given tier. Dimension - tier * fix unit test failures	2019-10-08 19:55:32 -07:00
Mohammad J. Khan	18758f5228	Support LDAP authentication/authorization (#6972 ) * Support LDAP authentication/authorization * fixed integration-tests * fixed Travis CI build errors related to druid-security module * fixed failing test * fixed failing test header * added comments, force build * fixes for strict compilation spotbugs checks * removed authenticator rolling credential update feature * removed escalator rolling credential update feature * fixed teamcity inspection deprecated API usage error * fixed checkstyle execution error, removed unused import * removed cached config as part of removing authenticator rolling credential update feature * removed config bundle entity as part of removing authenticator rolling credential update feature * refactored ldao configuration * added support for SSLContext configuration and TLSCertificateChecker * removed check to return authentication failure when user has no group assigned, will be checked and handled by the authorizer * Separate out authorizer checks between metadata-backed store user and LDAP user/groups * refactored BasicSecuritySSLSocketFactory usage to fix strict compilation spotbugs checks * fixes build issue * final review comments updates * final review comments updates * fixed LGTM and spellcheck alerts * Fixed Avatica auth failure error message check * Updated metadata credentials validator exception message string, replaced DB with metadata store	2019-10-08 17:08:27 -07:00
Clint Wylie	2f20799868	merge recommendations into basic-cluster-tuning, add additional info (#8649 ) * merge recommendations into basic-cluster-tuning, add additional info * stupid sidebar	2019-10-08 16:33:54 -07:00
Himanshu	c078ed40fd	groupBy query: optional limit push down to segment scan (#8426 ) * groupBy query: optional limit push down to segment scan * make segment level limit push down configurable * fix teamcity errors * fix segment limit pushdown flag handling on query level config override * use equals for comparator check * fix sql and null handling * fix unused imports * handle null offset in NullableValueGroupByColumnSelectorStrategy for buffer comparator similar to RowBasedGrouperHelper.NullableRowBasedKeySerdeHelper	2019-10-08 15:35:07 -07:00
Lucas Capistrant	d801ce2f29	Update rollup table to properly reflect 0.16.0 (#8638 ) This table stated that `index_parallel` tasks were best-effort only. However, this changed with #8061 and this documentation update was simply missed.	2019-10-07 12:37:15 -07:00
Xavier Léauté	1d42551d95	Fix statsd types (#8628 ) * fix segment underReplicated/unavailable counts to be gauges instead of counters * fix jvm/gc/cpu to be a counter instead of timre jvm/gc/cpu represents the total cpu time spent for multiple gc invocations, not the time spent in each gc cycle. the number needs to be divided by jvm/gc/count to get the average gc time per cycle * update docs * fix spellcheck	2019-10-06 14:14:09 -07:00
Parag Jain	f0d74b240d	password provider for basic authentication of HttpEmitterConfig (#8618 )	2019-10-02 15:59:17 -07:00
Nishant Bangarwa	8537fbeca7	Implementing dropwizard emitter for druid (#7363 ) * Implementing dropwizard emitter for druid making metric manager and alert emitters as optional * Refactor and make things work more improvements improve docs refactrings * Fix teamcity inspections * review comments * more review comments * add limit to max number of gauges * update pom version * fix pom * review comments * review comment * review comments * fix broken doc link review comments review comments * review comments * fix checkstyle * more spell check fixes * fix travis failures	2019-10-01 14:59:30 -07:00
pdeva	db65068c42	add reference to indexer nodes (#8607 )	2019-09-30 16:45:33 -06:00
Sashidhar Thallam	51a7235ebc	Making optimal usage of multiple segment cache locations (#8038 ) * #7641 - Changing segment distribution algorithm to distribute segments to multiple segment cache locations * Fixing indentation * WIP * Adding interface for location strategy selection, least bytes used strategy impl, round-robin strategy impl, locationSelectorStrategy config with least bytes used strategy as the default strategy * fixing code style * Fixing test * Adding a method visible only for testing, fixing tests * 1. Changing the method contract to return an iterator of locations instead of a single best location. 2. Check style fixes * fixing the conditional statement * Added testSegmentDistributionUsingLeastBytesUsedStrategy, fixed testSegmentDistributionUsingRoundRobinStrategy * to trigger CI build * Add documentation for the selection strategy configuration * to re trigger CI build * updated docs as per review comments, made LeastBytesUsedStorageLocationSelectorStrategy.getLocations a synchronzied method, other minor fixes * In checkLocationConfigForNull method, using getLocations() to check for null instead of directly referring to the locations variable so that tests overriding getLocations() method do not fail * Implementing review comments. Added tests for StorageLocationSelectorStrategy * Checkstyle fixes * Adding java doc comments for StorageLocationSelectorStrategy interface * checkstyle * empty commit to retrigger build * Empty commit * Adding suppressions for words leastBytesUsed and roundRobin of ../docs/configuration/index.md file * Impl review comments including updating docs as suggested * Removing checkLocationConfigForNull(), @NotEmpty annotation serves the purpose * Round robin iterator to keep track of the no. of iterations, impl review comments, added tests for round robin strategy * Fixing the round robin iterator * Removed numLocationsToTry, updated java docs * changing property attribute value from tier to type * Fixing assert messages	2019-09-28 00:17:44 -06:00
Himanshu	9f1f5e115c	doubleMean aggregator to be used at query time (#8459 ) * doubleMean aggregator for computing mean * make docs * build fixes * address review comment: handle null args	2019-09-26 08:04:33 -07:00
Nishant Bangarwa	a75ddaad9e	Add TrustedDomain Authenticator (#8248 ) * Add TrustedDomain Authenticator update javadoc Add nullable annotations Add cautionary note fix travis failure * add IP to spell checker	2019-09-25 11:25:03 -07:00
Rye	f2a444321b	Added live reports for Kafka and Native batch task (#8557 ) * Added live reports for Kafka and Native batch task * Removed unused local variables * Added the missing unit test * Refine unit test logic, add implementation for HttpRemoteTaskRunner * checksytle fixes * Update doc descriptions for updated API * remove unnecessary files * Fix spellcheck complaints * More details for api descriptions	2019-09-23 21:08:36 -07:00
Vadim Ogievetsky	52f3f2c229	fix docs version interpolation (#8568 )	2019-09-22 17:38:55 -07:00
Vadim Ogievetsky	94298f7809	Update Kafka loading docs to use the streaming data loader (#8544 ) * fix redirects * remove useless page * fix Single server reference configurations formatting * update batch data loading * update Kafka docs * fix typos and tests * add more links * fix spelling	2019-09-22 15:00:52 -07:00
Chi Cao Minh	aeac0d4fd3	Adjust defaults for hashed partitioning (#8565 ) * Adjust defaults for hashed partitioning If neither the partition size nor the number of shards are specified, default to partitions of 5,000,000 rows (similar to the behavior of dynamic partitions). Previously, both could be null and cause incorrect behavior. Specifying both a partition size and a number of shards now results in an error instead of ignoring the partition size in favor of using the number of shards. This is a behavior change that makes it more apparent to the user that only one of the two properties will be honored (previously, a message was just logged when the specified partition size was ignored). * Fix test * Handle -1 as null * Add -1 as null tests for single dim partitioning * Simplify logic to handle -1 as null * Address review comments	2019-09-21 20:57:40 -07:00
Chi Cao Minh	99b6eedab5	Rename partition spec fields (#8507 ) * Rename partition spec fields Rename partition spec fields to be consistent across the various types (hashed, single_dim, dynamic). Specifically, use targetNumRowsPerSegment and maxRowsPerSegment in favor of targetPartitionSize and maxSegmentSize. Consistent and clearer names are easier for users to understand and use. Also fix various IntelliJ inspection warnings and doc spelling mistakes. * Fix test * Improve docs * Add targetRowsPerSegment to HashedPartitionsSpec	2019-09-20 14:59:18 -06:00
Xavier Léauté	e184d24a74	add support for dogstatsd events in statsd-emitter (#8546 ) * add support for dogstatsd events in statsd-emitter * add option to turn on alert events (off by default) * updated docs	2019-09-19 08:12:30 -07:00
Chi Cao Minh	7dcbaca658	Spellcheck docs (#8548 ) * Spellcheck docs Fix spelling mistakes in docs and add CI job for running spellcheck on docs. * Add missing license header	2019-09-17 12:47:30 -07:00
Vadim Ogievetsky	0490909ab3	Web console: Update web console docs for 0.16.0 (#8530 ) * Update webconsole docs * home view * fix annotation typo	2019-09-13 09:09:36 -07:00
Clint Wylie	75978e5b98	move google ext docs from contrib to core (#8512 ) * move google ext docs from contrib to core * fix links * revert unintended change * more links, add note to example ext doc that it was removed, unlink from sidebar	2019-09-12 09:40:39 -07:00
Jonathan Wei	0145642d8b	Move router/indexer config/API docs to main pages (#8510 ) * Move router/indexer config/API docs to main pages * Restore missing properties, fix typo * Use sentence casing * Fix broken link	2019-09-11 21:42:58 -07:00
Clint Wylie	fb078eea1e	fix web-console build in src distribution, fix kafka doc minimum version (#8502 )	2019-09-10 21:01:07 -07:00
Chi Cao Minh	14a8613d69	Exit JVM on curator unhandled errors (#8458 ) * Exit JVM on curator unhandled errors If an unhandled error occurs when curator is talking to ZooKeeper, exit the JVM in addition to stopping the lifecycle to prevent the process from being left in a zombie state. With this change, BoundedExponentialBackoffRetryWithQuit is no longer needed as when curator exceeds the configured retries, it triggers its unhandled error listeners. A new "connectionTimeoutMs" CuratorConfig setting is added mostly to facilitate testing curator unhandled errors, but it may be useful for users as well. * Address review comments	2019-09-06 16:43:59 -07:00
Clint Wylie	fd58fbc8d3	fix statds dogstatsdServiceAsTag docs example to match behavior (#8477 )	2019-09-05 19:05:25 -07:00
SeKing	6a6893b406	Fix operator mistake of expression OR (#8452 ) * Add realization for updating version of derived segments in MaterializedView * add unit test, and change code style for the sake of ease of understanding * fix document's mistake of expression	2019-09-04 21:27:18 -07:00
Lucas Capistrant	bfb02f09f8	Add druid.segmentCache.numBootstrapThreads back to the docs (#8462 )	2019-09-04 20:27:17 -07:00
legendtkl	0be4a41c06	Website Doc: fix bash command (#8442 ) * fix "gunzip -k" to "gunzip -c"	2019-08-30 22:22:09 -07:00
Clint Wylie	3baf31e9a8	add documentation for group by array based result format (#8416 )	2019-08-28 08:30:31 -07:00
Jonathan Wei	c626452b47	Add nano-quickstart single server example configuration (#8390 ) * Add nano-quickstart single server example configuration * Use two workers * Shrink processing buffers	2019-08-24 22:07:20 -07:00
Furkan KAMACI	02fe3db911	Zookeeper version is updated. (#8363 ) * Zookeeper version is updated. * Zookeeper version is updated at licenses.yaml * licenses.yaml is updated and dependencies are fixed to make the project successfully build. * Zookeeper versions are fixed at licenses.yaml	2019-08-24 22:00:43 -07:00
Jihoon Son	95fa609615	Fix wrong partitionsSpec type names in the document (#8297 ) * Fix wrong type names for partitionsSpec * add unit tests; add json properties for backward compatibility * beautify conf names * remove maxRowsPerSegment from hashed partitionsSpec * fix doc build	2019-08-23 13:44:58 -07:00
Clint Wylie	7749571a7f	order and add more ports to hadoop docker container in hadoop indexing tutorial (#8329 ) LGTM	2019-08-23 15:43:06 -05:00
Surekha	cf2a2dd917	Add group_id to the sys.tasks table (#8304 ) * Add group_id to overlord tasks API and sys.tasks table * adjust test * modify docs * Make groupId nullable * fix integration test * fix toString * Remove groupId from TaskInfo * Modify docs and tests * modify TaskMonitorTest	2019-08-22 15:28:23 -07:00
Clint Wylie	010f70b371	autogenerate NOTICE.BINARY from NOTICE and licenses.yaml (#8306 ) * migrate binary notice entries to live in licenses.yaml, use licenses.yaml and NOTICE to generate NOTICE.BINARY at distribution time * +x * move release scripts to distribution/bin, fixup notice script, trim dependencies for avro and kerberos in licenses.yaml * add missing hdfs-storage dependencies * revert to old syntax, fixes * formatting * update notices for recently updated dependencies	2019-08-21 12:46:27 -07:00
Gian Merlino	d007477742	Docusaurus build framework + ingestion doc refresh. (#8311 ) * Docusaurus build framework + ingestion doc refresh. * stick to npm instead of yarn * fix typos * restore some _bin * Adjustments. * detect and fix redirect anchors * update anchor lint * Web-console: remove specific column filters (#8343) * add clear filter * update tool kit * remove usless check * auto run * add % * Fix resource leak (#8337) * Fix resource leak * Patch comments * Enable Spotbugs NP_NONNULL_RETURN_VIOLATION (#8234) * Fixes from PR review. * Fix more anchors. * Preamble nix. * Fix more anchors, headers * clean up placeholder page * add to website lint to travis config * better broken link checking * travis fix * Fixed more broken links * better redirects * unfancy catch * fix LGTM error * link fixes * fix md issues * Addl fixes	2019-08-20 21:48:59 -07:00
Fokko Driesprong	d5a19675dd	Remove fromPigAvroStorage from the docs (#8340 ) This one has been deprecated a while ago	2019-08-20 16:34:55 -07:00
Jonathan Wei	dd2e53baf4	Clarify Avro decoder docs (#8302 )	2019-08-19 15:37:18 -05:00
Jihoon Son	31af4eb9ad	Rename maxNumSubTasks to maxNumConcurrentSubTasks for native parallel index task (#8324 )	2019-08-16 15:57:13 -07:00
Jihoon Son	5dac6375f3	Add support for parallel native indexing with shuffle for perfect rollup (#8257 ) * Add TaskResourceCleaner; fix a couple of concurrency bugs in batch tasks * kill runner when it's ready * add comment * kill run thread * fix test * Take closeable out of Appenderator * add javadoc * fix test * fix test * update javadoc * add javadoc about killed task * address comment * Add support for parallel native indexing with shuffle for perfect rollup. * Add comment about volatiles * fix test * fix test * handling missing exceptions * more clear javadoc for stopGracefully * unused import * update javadoc * Add missing statement in javadoc * address comments; fix doc * add javadoc for isGuaranteedRollup * Rename confusing variable name and fix typos * fix typos; move fetch() to a better home; fix the expiration time * add support https	2019-08-15 17:43:35 -07:00
Jihoon Son	eeae5d9365	Add a warning about experimental segment locking (#8301 ) * Add a warning about experimental segment locking * fix typo	2019-08-15 16:07:59 -07:00
Jihoon Son	a5c9c2950f	Add missing maxBytesInMemory in tuningConfig for auto compaction (#8274 ) * Add missing tuningConfigs for auto compaciton * Add doc * add test	2019-08-13 14:10:26 -05:00
Alexandre Yang	6b4d028b96	[statsd-emitter] Add config to send Druid process/service as tag (#8238 ) * [statsd-emitter] Add serviceAsTag option * [statsd-emitter] Refactor serviceAsTag option * [statsd-emitter] Update statsd.md * [statsd-emitter] add default prefix * [statsd-emitter] update statsd.md * [statsd-emitter] Remove extra spaces * [statsd-emitter] Improve docs for config `dogstatsdServiceAsTag` * [statsd-emitter] Simplify equals() for StatsDEmitterConfig.java * [statsd-emitter] Add @Nullable for StatsDEmitterConfig.java	2019-08-12 13:18:44 -07:00
Nathan	b28e252d9a	Minor Spelling Error (#8277 ) * Minor Spelling Error * Update mySQL password in docs /extensions-core/mysql update druid.metadata.storage.connector.password	2019-08-09 16:06:02 -05:00
Jonathan Wei	e88bbe71c0	Adjust default globalIngestionHeapLimitBytes for indexer, add more docs (#8255 )	2019-08-07 23:04:07 -07:00
Jonathan Wei	5e57492298	Add docs for CliIndexer as an experimental feature (#8245 ) * Experimental CliIndexer docs * PR comments	2019-08-06 15:57:17 -07:00
Lucas Capistrant	e252abedc5	Enable toggling request logging on/off for different query types (#7562 ) * Enable ability to toggle SegmentMetadata request logging on/off * Move SegmentMetadata query log filter to FilteredRequestLogger * Update documentation to reflect the segment metadata flag moving to the filtered request logger * Modify patch to allow blacklist of query types to not log to request logger * Address styling and naming requests following latest code review * Fix indentation on multiple locations per Druid style rules	2019-08-06 15:47:30 +03:00
Samarth Jain	93cf9d4ad4	SQL support for t-digest based sketch aggregators (#8100 ) * SQL support for t-digest based sketch aggregators * Fix teamcity errors * Add missing dependencies * Remove unused dependency * Address code review comments * Add checks for compression param	2019-08-05 12:01:42 -07:00
Jihoon Son	1ee828ff49	Add a cluster-wide configuration to force timeChunk lock and add a doc for segment locking (#8173 ) * Add a cluster-wide configuration to force timeChunk lock and add a doc for segment locking * add more test * javadoc for missingIntervalsInOverwriteMode * Fix test * Address comments * avoid spotbugs	2019-08-02 20:30:05 -07:00
Chi Cao Minh	4bd3bad8ba	Add IPv4 SQL functions (#8223 ) * Add IPv4 SQL functions New SQL functions for filtering IPv4 addresses: - IPV4_MATCH: Check if IP address belongs to a subnet - IPV4_PARSE: Convert string IP address to integer - IPV4_STRINGIFY: Convert integer IP address to string These are the SQL analogs of the druid expressions with the same name. Filtering is more efficient when operating on IP addresses as integers instead of strings. * Refactor operator conversions into named constants	2019-08-01 21:29:58 -07:00
Clint Wylie	01c8c82982	correct kerberos doc extension load list (#8224 )	2019-08-01 17:03:25 -07:00
Chi Cao Minh	7783b31846	Add IPv4 druid expressions (#8197 ) * Add IPv4 druid expressions New druid expressions for filtering IPv4 addresses: - ipv4address_match: Check if IP address belongs to a subnet - ipv4address_parse: Convert string IP address to long - ipv4address_stringify: Convert long IP address to string These expressions operate on IP addresses represented as either strings or longs, so that they can be applied to dimensions with mixed representation of IP addresses. The filtering is more efficient when operating on IP addresses as longs. In other words, the intended use case is: 1) Use ipv4address_parse to convert to long at ingestion time 2) Use ipv4address_match to filter (on longs) at query time 3) Use ipv4adress_stringify to convert to (readable) string at query time * Fix licenses and null handling * Simplify IPv4 expressions * Fix tests * Fix check for valid ipv4 address string	2019-08-01 11:45:04 -07:00
Surekha	f0ecdfee30	Fix `is_realtime` column behavior in sys.segments table (#8154 ) * Fix is_realtime flag * make variable final * minor changes * Modify is_realtime behavior based on review comment * Fix UT	2019-07-31 22:26:49 -06:00
Nathan	716ce7fdc7	Spelling Error (#8206 )	2019-07-31 10:43:11 -07:00
Jihoon Son	385f492a55	Use PartitionsSpec for all task types (#8141 ) * Use partitionsSpec for all task types * fix doc * fix typos and revert to use isPushRequired * address comments * move partitionsSpec to core * remove hadoopPartitionsSpec	2019-07-30 17:24:39 -07:00
Clint Wylie	653b558134	sql firehose and firehose doc adjustments (#8067 ) * firehose doc adjustments * fix typo * additional information on parser types in ingestion docs * clarify ingest segment firehose docs, add sql firehose examples to sql extension pages * fixit * make sql firehose more forgiving my always constructing a MapInputRowParser from the parseSpec of whatever actual InputRowParser impl is provided, remove doc references to map based parsers * transforms * fix tests	2019-07-30 15:28:10 -07:00
Jonathan Wei	640b7afc1c	Add CliIndexer process type and initial task runner implementation (#8107 ) * Add CliIndexer process type and initial task runner implementation * Fix HttpRemoteTaskRunnerTest * Remove batch sanity check on PeonAppenderatorsManager * Fix paralle index tests * PR comments * Adjust Jersey resource logging * Additional cleanup * Fix SystemSchemaTest * Add comment to LocalDataSegmentPusherTest absolute path test * More PR comments * Use Server annotated with RemoteChatHandler * More PR comments * Checkstyle * PR comments * Add task shutdown to stopGracefully * Small cleanup * Compile fix * Address PR comments * Adjust TaskReportFileWriter and fix nits * Remove unnecessary closer * More PR comments * Minor adjustments * PR comments * ThreadingTaskRunner: cancel task run future not shutdownFuture and remove thread from workitem	2019-07-29 17:06:33 -07:00
Jihoon Son	61f4abece4	Add more warning to the doc for resetOffsetAutomatically (#8153 ) * Add more warnings to the doc for resetOffsetAutomatically * fix kinesis doc * fix typos * revise the description * capital * capitalize	2019-07-24 17:37:32 -07:00
Magnus Henoch	c87b47e0fa	More documentation formatting fixes (#8149 ) Add empty lines before bulleted lists and code blocks, to ensure that they show up properly on the web site. See also #8079.	2019-07-24 15:26:03 -07:00
Clint Wylie	b8b22b7aaa	fix references to bin/supervise in tutorial docs (#8087 )	2019-07-23 15:05:01 -07:00
Clint Wylie	83514958db	remove unnecessary lock in ForegroundCachePopulator leading to a lot of contention (#8116 ) * remove unecessary lock in ForegroundCachePopulator leading to a lot of contention * mutableboolean, javadocs,document some cache configs that were missing * more doc stuff * adjustments * remove background documentation	2019-07-23 10:57:59 -07:00
Sashidhar Thallam	ea4bad7836	Druid SQL EXTRACT time function - adding support for additional Time Units (#8068 ) * 1. Added TimestampExtractExprMacro.Unit for MILLISECOND 2. expr eval for MILLISECOND 3. Added a test case to test extracting millisecond from expression. #7935 * 1. Adding DATASOURCE4 in tests. 2. Adding test TimeExtractWithMilliseconds * Fixing testInformationSchemaTables test * Fixing failing tests in DruidAvaticaHandlerTest * Adding cannotVectorize() call before the test * Extract time function - Adding support for MICROSECOND, ISODOW, ISOYEAR and CENTURY time units, documentation changes. * Adding MILLISECOND in test case * Adding support DECADE and MILLENNIUM, updating test case and documentation * Fixing expression eval for DECADE and MILLENIUM	2019-07-19 20:38:32 -07:00
Roman Leventov	ceb969903f	Refactor SQLMetadataSegmentManager; Change contract of REST met… (#7653 ) * Refactor SQLMetadataSegmentManager; Change contract of REST methods in DataSourcesResource * Style fixes * Unused imports * Fix tests * Fix style * Comments * Comment fix * Remove unresolvable Javadoc references; address comments * Add comments to ImmutableDruidDataSource * Merge with master * Fix bad web-console merge * Fixes in api-reference.md * Rename in DruidCoordinatorRuntimeParams * Fix compilation * Residual changes	2019-07-17 17:18:48 +03:00
Magnus Henoch	179253a2fc	Fix documentation formatting (#8079 ) The Markdown dialect used when publishing the documentation to the web site is much more sensitive than Github-flavoured Markdown. In particular, it requires an empty line before code blocks (unless the code block starts right after a heading), otherwise the code block gets formatted in-line with the previous paragraph. Likewise for bullet-point lists.	2019-07-15 09:55:18 -07:00
Gian Merlino	ffa25b7832	Query vectorization. (#6794 ) * Benchmarks: New SqlBenchmark, add caching & vectorization to some others. - Introduce a new SqlBenchmark geared towards benchmarking a wide variety of SQL queries. Rename the old SqlBenchmark to SqlVsNativeBenchmark. - Add (optional) caching to SegmentGenerator to enable easier benchmarking of larger segments. - Add vectorization to FilteredAggregatorBenchmark and GroupByBenchmark. * Query vectorization. This patch includes vectorized timeseries and groupBy engines, as well as some analogs of your favorite Druid classes: - VectorCursor is like Cursor. (It comes from StorageAdapter.makeVectorCursor.) - VectorColumnSelectorFactory is like ColumnSelectorFactory, and it has methods to create analogs of the column selectors you know and love. - VectorOffset and ReadableVectorOffset are like Offset and ReadableOffset. - VectorAggregator is like BufferAggregator. - VectorValueMatcher is like ValueMatcher. There are some noticeable differences between vectorized and regular execution: - Unlike regular cursors, vector cursors do not understand time granularity. They expect query engines to handle this on their own, which a new VectorCursorGranularizer class helps with. This is to avoid too much batch-splitting and to respect the fact that vector selectors are somewhat more heavyweight than regular selectors. - Unlike FilteredOffset, FilteredVectorOffset does not leverage indexes for filters that might partially support them (like an OR of one filter that supports indexing and another that doesn't). I'm not sure that this behavior is desirable anyway (it is potentially too eager) but, at any rate, it'd be better to harmonize it between the two classes. Potentially they should both do some different thing that is smarter than what either of them is doing right now. - When vector cursors are created by QueryableIndexCursorSequenceBuilder, they use a morphing binary-then-linear search to find their start and end rows, rather than linear search. Limitations in this patch are: - Only timeseries and groupBy have vectorized engines. - GroupBy doesn't handle multi-value dimensions yet. - Vector cursors cannot handle virtual columns or descending order. - Only some filters have vectorized matchers: "selector", "bound", "in", "like", "regex", "search", "and", "or", and "not". - Only some aggregators have vectorized implementations: "count", "doubleSum", "floatSum", "longSum", "hyperUnique", and "filtered". - Dimension specs other than "default" don't work yet (no extraction functions or filtered dimension specs). Currently, the testing strategy includes adding vectorization-enabled tests to TimeseriesQueryRunnerTest, GroupByQueryRunnerTest, GroupByTimeseriesQueryRunnerTest, CalciteQueryTest, and all of the filtering tests that extend BaseFilterTest. In all of those classes, there are some test cases that don't support vectorization. They are marked by special function calls like "cannotVectorize" or "skipVectorize" that tell the test harness to either expect an exception or to skip the test case. Testing should be expanded in the future -- a project in and of itself. Related to #3011. * WIP * Adjustments for unused things. * Adjust javadocs. * DimensionDictionarySelector adjustments. * Add "clone" to BatchIteratorAdapter. * ValueMatcher javadocs. * Fix benchmark. * Fixups post-merge. * Expect exception on testGroupByWithStringVirtualColumn for IncrementalIndex. * BloomDimFilterSqlTest: Tag two non-vectorizable tests. * Minor adjustments. * Update surefire, bump up Xmx in Travis. * Some more adjustments. * Javadoc adjustments * AggregatorAdapters adjustments. * Additional comments. * Remove switching search. * Only missiles.	2019-07-12 12:54:07 -07:00
Chi Cao Minh	da3d141dd2	Add inline firehose (#8056 ) * Add inline firehose To allow users to quickly parsing and schema, add a firehose that reads data that is inlined in its spec. * Address review comments * Remove suppression of sonar warnings	2019-07-11 21:43:46 -07:00
Atul Mohan	631cda649b	Include replicated segment size property for datasources endpoint (#8039 ) * Add replication size * Summon comma	2019-07-11 01:10:38 -07:00
Himanshu	14aec7fcec	add config to optionally disable all compression in intermediate segment persists while ingestion (#7919 ) * disable all compression in intermediate segment persists while ingestion * more changes and build fix * by default retain existing indexingSpec for intermediate persisted segments * document indexSpecForIntermediatePersists index tuning config * fix build issues * update serde tests	2019-07-10 12:22:24 -07:00
Jihoon Son	0a3538b569	Fix license check in travis and make it optional (#8049 ) * Fix license check in travis and make it optional * debug * fix build * too loud maven * move MAVEN_OPTS to top and add comments * adjust script * remove mvn option from python script	2019-07-09 19:35:29 -07:00
Sashidhar Thallam	3353da2974	Adding missing docs for druid.indexer.logs.disableAcl (#8046 )	2019-07-09 16:11:25 -07:00
Jihoon Son	12f12676e3	Binary license management system (#7998 ) * Binary license management system * add missing file * add comment * Address comments * print missing licenses * print druid module name * Add missing licenses and update versions * fix library versions and add missing ones. also fix pom.xml * testing multi thread * Parallel report generation * fix build error * install pyyaml and use old api * install python3 * fix travis script * python3.6 * pip * setuptools * python3-setuptools * address comment * error on not found reports or registered licenses * removed licenses * debug * travis debug * add missing licenses * travis debug * debug * remove debug code * test build script * travis debug * still debug * add missing python lib * debug * debug * fix travis * fix travis * debug travis * flush print * print something more to keep travis alive * adjust print * single threaded * single threaded * debug * debug * remove debug * remove deprecated-2017Q4 from travis conf * remove comments and duplicate sudo	2019-07-08 12:24:51 -07:00
Eyal Yurman	2eee711653	Add missing reference to Materialized-View extension. (#8003 ) * Reference Materialized View extension from extensions page. * Add comma	2019-07-06 13:50:41 -07:00
Dinesh Sawant	9c7c7c58ae	Fix overlord port in delete data tutorial (#8037 ) In Single-Server Quickstart tutorial the overlord and coordinator is started as one process on port 8081. But in delete data tutorial the kill task is sent to 8090 port, which fails.	2019-07-06 08:50:01 -07:00
Chi Cao Minh	0ded0ce414	Add round support for DS-HLL (#8023 ) * Add round support for DS-HLL Since the Cardinality aggregator has a "round" option to round off estimated values generated from the HyperLogLog algorithm, add the same "round" option to the DataSketches HLL Sketch module aggregators to be consistent. * Fix checkstyle errors * Change HllSketchSqlAggregator to do rounding * Fix test for standard-compliant null handling mode	2019-07-05 15:37:58 -07:00
Clint Wylie	42a7b8849a	remove FirehoseV2 and realtime node extensions (#8020 ) * remove firehosev2 and realtime node extensions * revert intellij stuff * rat exclusion	2019-07-04 15:40:22 -07:00
Gian Merlino	613f09b45a	SQL: Add TIME_CEIL function. (#8027 ) Also simplify conversions for CEIL, FLOOR, and TIME_FLOOR by allowing them to share more code.	2019-07-04 15:40:03 -07:00
Clint Wylie	3b84246cd6	add SQL docs for multi-value string dimensions (#8011 ) * add SQL docs for multi-value string dimensions * formatting consistency * fix typo * adjust	2019-07-03 08:22:33 -07:00
Clint Wylie	c556d44a19	more sql support for expression array functions (#7974 ) * more sql support for expression array functions * prepend/slice * doc fixes * fix imports * fix tests * add null numeric expr for proper conversions between ExprEval and Expr and back to ExprEval * re-arrange * imports :( * add append/prepend test	2019-07-02 21:39:26 -07:00

1 2 3 4 5 ...

2131 Commits