druid

Commit Graph

Author	SHA1	Message	Date
Jihoon Son	ad437dd655	Add shuffle metrics for parallel indexing (#10359 ) * Add shuffle metrics for parallel indexing * javadoc and concurrency test * concurrency * fix javadoc * Feature flag * doc * fix doc and add a test * checkstyle * add tests * fix build and address comments	2020-10-10 19:35:17 -07:00
Joseph Glanville	7ce9ac4548	Fix Avro support in Web Console (#10232 ) * Fix Avro OCF detection prefix and run formation detection on raw input * Support Avro Fixed and Enum types correctly * Check Avro version byte in format detection * Add test for AvroOCFReader.sample Ensures that the Sampler doesn't receive raw input that it can't serialize into JSON. * Document Avro type handling * Add TS unit tests for guessInputFormat	2020-10-07 21:08:22 -07:00
Mainak Ghosh	8168e14e92	Adding task slot count metrics to Druid Overlord (#10379 ) * Adding more worker metrics to Druid Overlord * Changing the nomenclature from worker to peon as that represents the metrics that we want to monitor better * Few more instance of worker usage replaced with peon * Modifying the peon idle count logic to only use eligible workers available capacity * Changing the naming to task slot count instead of peon * Adding some unit test coverage for the new test runner apis * Addressing Review Comments * Modifying the TaskSlotCountStatsProvider apis so that overlords which are not leader do not emit these metrics * Fixing the spelling issue in the docs * Setting the annotation Nullable on the TaskSlotCountStatsProvider methods	2020-09-28 23:50:38 -07:00
Clint Wylie	1d6cb624f4	add vectorizeVirtualColumns query context parameter (#10432 ) * add vectorizeVirtualColumns query context parameter * oops * spelling * default to false, more docs * fix test * fix spelling	2020-09-28 18:48:34 -07:00
Clint Wylie	b95bf444b2	add docs for kinesis lag metrics (#10435 )	2020-09-28 13:13:53 -07:00
Jihoon Son	0cc9eb4903	Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided (#10288 ) * Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided * query context * fix tests; add more test * javadoc * docs and more tests * remove default and hadoop tests * consistent name and fix javadoc * spelling and field name * default function for partitionsSpec * other comments * address comments * fix tests and spelling * test * doc	2020-09-24 16:32:56 -07:00
Jonathan Wei	cb30b1fe23	Automatically determine numShards for parallel ingestion hash partitioning (#10419 ) * Automatically determine numShards for parallel ingestion hash partitioning * Fix inspection, tests, coverage * Docs and some PR comments * Adjust locking * Use HllSketch instead of HyperLogLogCollector * Fix tests * Address some PR comments * Fix granularity bug * Small doc fix	2020-09-24 13:47:53 -07:00
Clint Wylie	dad69481f0	add light weight version of /druid/coordinator/v1/lookups/nodeStatus (#10422 ) * add light weight version /druid/coordinator/v1/lookups/nodeStatus * review stuffs	2020-09-24 14:36:53 +08:00
Maytas Monsereenusorn	72f1b55f56	Add last_compaction_state to sys.segments table (#10413 ) * Add is_compacted to sys.segments table * change is_compacted to last_compaction_state * fix tests * fix tests * address comments	2020-09-23 15:29:36 -07:00
sthetland	ae247b6e63	Document change in results of groupBy queries with subtotalsSpec (#10405 ) * subtotalsSpec results with null values Document the format change in results of a groupBy query with a subtotalsSpec. This update applies to 0.18 and later. * Review catches	2020-09-19 10:51:23 -07:00
Mainak Ghosh	14072d3ab0	Adding more dimensions to the audit log entry (#10373 ) * Adding more dimensions to the audit log entry * Making adding payload in audit metric optional * Changing the name of the parameter to includePayloadAsDimensionInMetric. Adding a unit test * Fixing the intellij code introspection issues	2020-09-17 18:36:28 -07:00
Atul Mohan	b6ad790dc7	Support combining inputsource for parallel ingestion (#10387 ) * Add combining inputsource * Fix documentation Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>	2020-09-15 16:25:35 -07:00
Jihoon Son	8657b23ab2	Integration tests and docs for auto compaction with different partitioning (#10354 ) * Working * add test * doc * fix test * split other integration test * exclude other-index from other tests * doc anchor fix * adjust task slots and number of merge tasks * spell check * reduce maxNumConcurrentSubTasks to 1 * maxNumConcurrentSubtasks for range partitinoing * reduce memory for historical * change group name	2020-09-15 11:28:09 -07:00
Suneet Saldanha	f71ba6f2c2	Vectorized ANY aggregators (#10338 ) * WIP vectorized ANY aggregators * tests * fix aggs * cleanup * code review + tests * docs * use NilVectorSelector when needed * fix spellcheck * dont instantiate vectors * cleanup	2020-09-14 19:44:58 -07:00
Abhishek Agarwal	f5e2645bbb	Support SearchQueryDimFilter in sql via new methods (#10350 ) * Support SearchQueryDimFilter in sql via new methods * Contains is a reserved word * revert unnecessary change * Fix toDruidExpression method * rename methods * java docs * Add native functions * revert change in dockerfile * remove changes from dockerfile * More tests * travis fix * Handle null values better	2020-09-14 09:57:54 -07:00
Curt Buechter	e3735602f2	Fix typo (#10385 )	2020-09-11 16:31:36 -07:00
Lucas Capistrant	690e070c43	Fix doc for name of dynamic config to pause coordination (#10345 )	2020-09-11 08:40:06 -05:00
Abhishek Agarwal	a5c46dc84b	Add vectorization for druid-histogram extension (#10304 ) * First draft * Remove redundant code from FixedBucketsHistogramAggregator classes * Add test cases for new classes * Fix tests in sql compatible mode * Typo fix * Fix comment * Add spelling * Vectorize only for supported types * Rename internal aggregator files * Fix tests	2020-09-09 13:56:33 -07:00
LightGHLi	a3bb6ee4a6	Add missing comma between JSON members in data-formats.md (#10343 )	2020-09-03 20:03:06 -07:00
Gian Merlino	5cd7610fb6	SQL support for union datasources. (#10324 ) * SQL support for union datasources. Exposed via the "UNION ALL" operator. This means that there are now two different implementations of UNION ALL: one at the top level of a query that works by concatenating subquery results, and one at the table level that works by creating a UnionDataSource. The SQL documentation is updated to discuss these two use cases and how they behave. Future work could unify these by building support for a native datasource that represents the union of multiple subqueries. (Today, UnionDataSource can only represent the union of tables, not subqueries.) * Fixes. * Error message for sanity check. * Additional test fixes. * Add some error messages.	2020-08-28 07:57:06 -07:00
Gian Merlino	21703d81ac	Fix handling of 'join' on top of 'union' datasources. (#10318 ) * Fix handling of 'join' on top of 'union' datasources. The problem is that unions are typically rewritten into a series of individual queries on the underlying tables, but this isn't done when the union is wrapped in a join. The main changes are in UnionQueryRunner: 1) Replace an instanceof UnionQueryRunner check with DataSourceAnalysis. 2) Replace a "query.withDataSource" call with a new function, "Queries.withBaseDataSource". Together, these enable UnionQueryRunner to "see through" a join. * Tests. * Adjust heap sizes for integration tests. * Different approach, more tests. * Tweak. * Styling.	2020-08-26 14:23:54 -07:00
Fernando	69d8645425	Adding supported compression formats for native batch ingestion (#10306 ) * Adding supported compression formats for native batch ingestion * Update docs/ingestion/native-batch.md Co-authored-by: sthetland <steve.hetland@imply.io> * fix spellcheck Co-authored-by: Suneet Saldanha <suneet@apache.org> Co-authored-by: sthetland <steve.hetland@imply.io>	2020-08-26 12:39:48 -07:00
Gian Merlino	91bb27cdf7	Clarify SQL behavior for multi-value dimensions. (#10276 ) There are some known inconsistencies between SQL and native that users should be aware of.	2020-08-25 10:11:16 -07:00
frank chen	028442e75e	Redis cache extension enhancement (#10240 ) * support redis cluster * add 'password', 'database' properties * test cases passed * update doc * some improvements * fix CI * add more test cases to improve branch coverage * fix dependency check for test * resolve review comments	2020-08-24 10:29:04 +08:00
Gian Merlino	0910d22f48	Add SQL "OFFSET" clause. (#10279 ) * Add SQL "OFFSET" clause. Under the hood, this uses the new offset features from #10233 (Scan) and #10235 (GroupBy). Since Timeseries and TopN queries do not currently have an offset feature, SQL planning will switch from one of those to Scan or GroupBy if users add an OFFSET. Includes a refactoring to harmonize offset and limit planning using an OffsetLimit wrapper class. This is useful because it ensures that the various places that need to deal with offset and limit collapsing all behave the same way, using its "andThen" method. * Fix test and add another test.	2020-08-21 14:11:54 -07:00
Jihoon Son	b5b3e6ecce	Add maxNumFiles to splitHintSpec (#10243 ) * Add maxNumFiles to splitHintSpec * missing link * fix build failure; use maxNumFiles for integration tests * spelling * lower default * Update docs/ingestion/native-batch.md Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * address comments; change default maxSplitSize * spelling * typos and doc * same change for segments splitHintSpec * fix build * fix build Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2020-08-21 09:43:58 -07:00
Suneet Saldanha	0891b1f833	Add note about aggregations on floats (#10285 ) * Add note about aggreations on floats Floating point math is known to be unstable. Due to the way aggregators work across segments it's possible for the same query operating on the same data to produce slightly different results. The same problem exists with any aggregators that are not commutative since the merge order across segments is not guaranteed. * Also talk about doubles * Apply suggestions from code review	2020-08-17 13:29:57 -07:00
Vatsal Bajpai	ee40d00be1	typo fix from hear to here (#10292 ) Should be `There are no other changes that need to be made here`	2020-08-17 07:54:21 -07:00
Xavier Léauté	225490474d	Update Kafka dependencies to 2.6.0 (#10286 ) * update Kafka dependencies to Kafka 2.6.0 * switch to Scala 2.13 build of Kafka * update integration tests * update Kafka tutorial	2020-08-15 07:56:40 -07:00
Gian Merlino	6cca7242de	Add "offset" parameter to the Scan query. (#10233 ) * Add "offset" parameter to the Scan query. It works by doing the query as normal and then throwing away the first "offset" number of rows on the broker. * Fix constructor call. * Fix up JSONs. * Fix call to ScanQuery. * Doc update. * Fix javadocs. * Spotbugs, LGTM suppressions. * Javadocs. * Fix suppression. * Stabilize Scan query result order, add tests. * Update LGTM comment. * Fixup. * Test different batch sizes too. * Nicer tests. * Fix comment.	2020-08-13 14:56:24 -07:00
Clint Wylie	e053348f74	add hasNulls to ColumnCapabilities, ColumnAnalysis (#10219 ) * add isNullable to ColumnCapabilities, ColumnAnalysis * better builder * fix segment metadata queries in integration tests * adjustments * cleanup * fix spotbugs * treat unknown as true in segmentmetadata * rename to hasNulls, add docs * fixup * test the dim indexer selector isNull fix for numeric columns * fixes * oof	2020-08-13 14:55:32 -07:00
Gian Merlino	d36a0f61da	Clarify documentation on dimensions, dimensionExclusions. (#10265 ) In particular: exclusions are ignored if dimensions are set.	2020-08-12 08:06:53 -07:00
Abhishek Radhakrishnan	dc16abae34	Vectorization support for long, double, float min & max aggregators. (#10260 ) * LongMaxVectorAggregator support and test case. * DoubleMinVectorAggregator and test cases. * DoubleMaxVectorAggregator and unit test. * FloatMinVectorAggregator and FloatMaxVectorAggregator. * Documentation update to include the other vector aggregators. * Bug fix. * checkstyle formatting fixes. * CalciteQueryTest cases update. * Separate test classes for FloatMaxAggregation and FloatMniAggregation. * remove the cannotVectorize for float max/min aggregator in test. * Tests in GroupByQueryRunner, GroupByTimeseriesQueryRunner and TimeseriesQueryRunner.	2020-08-10 15:18:55 -07:00
Atul Mohan	06539bc828	Set default server.maxsize to the sum of segment cache (#10255 ) * Default server.maxsize * Remove maxsize refs from config Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>	2020-08-10 09:21:22 -07:00
Gian Merlino	b6aaf59e8c	Add "offset" parameter to GroupBy query. (#10235 ) * Add "offset" parameter to GroupBy query. It works by doing the query as normal and then throwing away the first "offset" number of rows on the broker. * Stabilize GroupBy sorts. * Fix inspections. * Fix suppression. * Fixups. * Move TopNSequence to druid-core. * Addl comments. * NumberedElement equals verification. * Changes from review.	2020-08-05 15:39:58 -07:00
Abhishek Radhakrishnan	34a4113752	Add vectorization support for the longMin aggregator. (#10211 ) * Fix minor formatting in docs. * Add Nullhandling initialization for test to run from IDE. * Vectorize longMin aggregator. - A new vectorized class for the vectorized long min aggregator. - Changes to AggregatorFactory to support vectorize functionality. - Few changes to schema evolution test to add LongMinAggregatorFactory. * Add longSum to the supported vectorized aggregator implementations. * Add MIN() long min to calcite query test that can vectorize. * Add simple long aggregations test. * Fixup formatting per checkstyle guide. * fixup and add more tests for long min aggregator. * Override test for groupBy since timestamps are handled differently. * Null compatibility check in test. * Review comment: Add a test case to LongMinAggregationTest.	2020-08-01 15:32:09 -07:00
frank chen	646fa84d04	Support unit on byte-related properties (#10203 ) * support unit suffix on byte-related properties * add doc * change default value of byte-related properites in example files * fix coding style * fix doc * fix CI * suppress spelling errors * improve code according to comments * rename Bytes to HumanReadableBytes * add getBytesInInt to get value safely * improve doc * fix problem reported by CI * fix problem reported by CI * resolve code review comments * improve error message * improve code & doc according to comments * fix CI problem * improve doc * suppress spelling check errors	2020-07-31 09:58:48 +08:00
Jian Wang	271f90f205	Add segment pruning for hash based shard spec (#9810 ) * Add segment pruning for hash based partitioning * Update doc * Add additional test * Address comments * Fix unit test failure Co-authored-by: Jian Wang <jwang@pinterest.com>	2020-07-30 18:44:26 -07:00
Maytas Monsereenusorn	574b062f1f	Cluster wide default query context setting (#10208 ) * Cluster wide default query context setting * Cluster wide default query context setting * Cluster wide default query context setting * add docs * fix docs * update props * fix checkstyle * fix checkstyle * fix checkstyle * update docs * address comments * fix checkstyle * fix checkstyle * fix checkstyle * fix checkstyle * fix checkstyle * fix NPE	2020-07-29 15:19:18 -07:00
Clint Wylie	79dffefbf8	add explicit example for jdbc query context on connection properties (#10182 ) * add explicit example for jdbc query context on connection properties * make comment clearer * Update sql.md * Update sql.md	2020-07-24 13:43:04 -07:00
mans2singh	d4bd6e5207	ingestion and tutorial doc update (#10202 )	2020-07-21 17:52:23 -07:00
Joseph Glanville	f3023c6058	Fix formatting in druid-pac4j documentation (#10174 ) Superfluous column broke table formatting.	2020-07-12 18:51:42 -07:00
Antoine Huret	88d20a61a6	renamed authenticationChain to authenticatorChain (#10143 )	2020-07-08 19:58:21 -07:00
Gian Merlino	9587fc0b84	Fix documentation for Kinesis fetchThreads. (#10156 ) * Fix documentation for Kinesis fetchThreads The default was changed in #9819, but the documentation wasn't updated. * Add 'procs' to spelling.	2020-07-08 19:47:09 -07:00
Jihoon Son	53a2550571	Follow-up for RetryQueryRunner fix (#10144 ) * address comments; use guice instead of query context * typo * QueryResource tests * address comments * catch queryException * fix spell check	2020-07-08 13:28:11 -07:00
Gian Merlino	11c0da8097	Add availability and consistency docs. (#10149 ) * Add availability and consistency docs. Describes transactional ingestion and atomic replacement. Also, this patch deletes some bad advice from the javadocs for SegmentTransactionalInsertAction. * Fix missing word.	2020-07-07 15:22:52 -07:00
Fullstop000	bcf41922ce	Remove unsupported task types in doc (#10111 )	2020-07-04 18:13:53 -07:00
Atul Mohan	367eaedbb4	Clarify change in behavior for druid.server.maxSize (#10105 ) * Clarify maxSize docs * Add info about maxSize Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>	2020-07-01 22:22:18 -07:00
frank chen	60c6bd5b4c	support Aliyun OSS service as deep storage (#9898 ) * init commit, all tests passed * fix format Signed-off-by: frank chen <frank.chen021@outlook.com> * data stored successfully * modify config path * add doc * add aliyun-oss extension to project * remove descriptor deletion code to avoid warning message output by aliyun client * fix warnings reported by lgtm-com * fix ci warnings Signed-off-by: frank chen <frank.chen021@outlook.com> * fix errors reported by intellj inspection check Signed-off-by: frank chen <frank.chen021@outlook.com> * fix doc spelling check Signed-off-by: frank chen <frank.chen021@outlook.com> * fix dependency warnings reported by ci Signed-off-by: frank chen <frank.chen021@outlook.com> * fix warnings reported by CI Signed-off-by: frank chen <frank.chen021@outlook.com> * add package configuration to support showing extension info Signed-off-by: frank chen <frank.chen021@outlook.com> * add IT test cases and fix bugs Signed-off-by: frank chen <frank.chen021@outlook.com> * 1. code review comments adopted 2. change schema from 'aliyun-oss' to 'oss' Signed-off-by: frank chen <frank.chen021@outlook.com> * add license info Signed-off-by: frank chen <frank.chen021@outlook.com> * fix doc Signed-off-by: frank chen <frank.chen021@outlook.com> * exclude execution of IT testcases of OSS extension from CI Signed-off-by: frank chen <frank.chen021@outlook.com> * put the extensions under contrib group and add to distribution * fix names in test cases * add unit test to cover OssInputSource * fix names in test cases * fix dependency problem reported by CI Signed-off-by: frank chen <frank.chen021@outlook.com>	2020-07-01 22:20:53 -07:00
Clint Wylie	c5540f46ed	fixes for ranger docs (#10109 )	2020-07-01 18:26:41 -07:00

1 2 3 4 5 ...

2180 Commits