druid

Commit Graph

Author	SHA1	Message	Date
Karan Kumar	b94390ba33	Adding Shared Access resource support for azure (#12266 ) Azure Blob storage has multiple modes of authentication. One of them is Shared access resource . This is very useful in cases when we do not want to add the account key in the druid properties .	2022-02-22 18:27:43 +05:30
Victoria Lim	c61b19d443	Refactor SQL docs (#12239 ) * refactor and link fixes * add sql docs to left nav * code format for needle * updated web console script * link fixes * update earliest/latest functions * edits for grammar and style * more link fixes * another link * update with #12226 * update .spelling file	2022-02-11 14:43:30 -08:00
Victoria Lim	4ede3bbff6	Docs updates (#12069 ) * minor updates to docs * remove en.json	2021-12-14 14:38:18 -08:00
jacobtolar	f7f5505631	Add avro_ocf to supported Kafka/Kinesis InputFormats (#11865 ) * Update docs - Kinesis InputFormat ingestion * Add avro_ocf to list of supported Kafka InputFormats * Remove extra whitespace. * Update kafka-supervisor-reference.md * Delete extra whitespace.	2021-12-03 07:57:26 -08:00
Charles Smith	7ed46800c3	Docs: Add multi-dimension partitioning doc; refactor native batch and separate into smaller topics. (#11983 ) Adds documentation for multi-dimension partitioning. cc: @kfaraz Refactors the native batch partitioning topic as follows: Native batch ingestion covers parallel-index Native batch simple task indexing covers index Native batch input sources covers ioSource Native batch ingestion with firehose covers deprecated firehose	2021-12-03 16:37:14 +05:30
benkrug	11746b8536	Update datasketches-hll.md (#12010 ) under "Aggregators", about the lgK setting, it said "Must be a power of 2 from 4 to 21 inclusively." 21 is not a power of 2, nor is 12, the given default. I think there may have been confusion because lgK represents log2 of K. We could say "K must be a power of 2...", or just say lgK must be between 4 and 21.	2021-11-30 18:52:00 -08:00
Charles Smith	f536f31229	clarify avro support & general style improvements (#11975 ) * clarify avro support & general style improvements * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update avro.md remove redundancy Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2021-11-28 16:10:18 +08:00
Charles Smith	33a5cda061	Docs: Splits Kafka topic. Adds detailed example for kafka inputFormat (#11912 ) * Splits Kafka topic according to function. Adds detailed example for kafka inputFormat * Apply suggestions from code review accept suggestions from review Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review accept suggestions Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * accept suggestions * accept suggestions * final typos and clarifications * bringing forward some syntax fixes Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2021-11-12 13:02:23 -08:00
zachjsh	1d6df48145	Warn if cache size of lookup is beyond max size (#11863 ) Enhanced the ExtractionNamespace interface in lookups-cached-global core extension with the ability to set a maxHeapPercentage for the cache of the respective namespace. The reason for adding this functionality, is make it easier to detect when a lookup table grows to a size that the underlying service cannot handle, because it does not have enough memory. The default value of maxHeap for the interface is -1, which indicates that no maxHeapPercentage has been set. For the JdbcExtractionNamespace and UriExtractionNamespace implementations, the default value is null, which will cause the respective service that the lookup is loaded in, to warn when its cache is beyond mxHeapPercentage of the service's configured max heap size. If a positive non-null value is set for the namespace's maxHeapPercentage config, this value will be honored for all services that the respective lookup is loaded onto, and consequently log warning messages when the cache of the respective lookup grows beyond this respective percentage of the services configured max heap size. Warnings are logged every time that either Uri based or Jdbc based lookups are regenerated, if the maxHeapPercentage constraint is violated. No other implementations will log warnings at this time. No error is thrown when the size exceeds the maxHeapPercentage at this time, as doing so could break functionality for existing users. Previously the JdbcCacheGenerator generated its cache by materializing all rows of the underling table in memory at once; this made it difficult to log warning messages in the case that the results from the jdbc query were very large and caused the service to run out of memory. To help with this, this pr makes it so that the jdbc query results are instead streamed through an iterator.	2021-11-03 21:32:22 -04:00
Charles Smith	6089a168ea	Docs - update dynamic config provider topic (#11795 ) * update dynamic config provider * update topic * add examples for dynamic config provider: * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/operations/dynamic-config-provider.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/operations/dynamic-config-provider.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/operations/dynamic-config-provider.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/operations/dynamic-config-provider.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/operations/dynamic-config-provider.md Co-authored-by: Clint Wylie <cjwylie@gmail.com> * Update docs/operations/dynamic-config-provider.md Co-authored-by: Clint Wylie <cjwylie@gmail.com> * Update kafka-ingestion.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2021-10-14 17:51:32 -07:00
Victoria Lim	42e44269be	Docs update for druid-basic-security (#11782 ) * update druid-basic-security * typo * revisions from review	2021-10-08 14:45:09 -07:00
lokesh-lingarajan	ad6609a606	Kafka Input Format for headers, key and payload parsing (#11630 ) ### Description Today we ingest a number of high cardinality metrics into Druid across dimensions. These metrics are rolled up on a per minute basis, and are very useful when looking at metrics on a partition or client basis. Events is another class of data that provides useful information about a particular incident/scenario inside a Kafka cluster. Events themselves are carried inside kafka payload, but nonetheless there are some very useful metadata that is carried in kafka headers that can serve as useful dimension for aggregation and in turn bringing better insights. PR(https://github.com/apache/druid/pull/10730) introduced support of Kafka headers in InputFormats. We still need an input format to parse out the headers and translate those into relevant columns in Druid. Until that’s implemented, none of the information available in the Kafka message headers would be exposed. So first there is a need to write an input format that can parse headers in any given format(provided we support the format) like we parse payloads today. Apart from headers there is also some useful information present in the key portion of the kafka record. We also need a way to expose the data present in the key as druid columns. We need a generic way to express at configuration time what attributes from headers, key and payload need to be ingested into druid. We need to keep the design generic enough so that users can specify different parsers for headers, key and payload. This PR is designed to solve the above by providing wrapper around any existing input formats and merging the data into a single unified Druid row. Lets look at a sample input format from the above discussion "inputFormat": { "type": "kafka", // New input format type "headerLabelPrefix": "kafka.header.", // Label prefix for header columns, this will avoid collusions while merging columns "recordTimestampLabelPrefix": "kafka.", // Kafka record's timestamp is made available in case payload does not carry timestamp "headerFormat": // Header parser specifying that values are of type string { "type": "string" }, "valueFormat": // Value parser from json parsing { "type": "json", "flattenSpec": { "useFieldDiscovery": true, "fields": [...] } }, "keyFormat": // Key parser also from json parsing { "type": "json" } } Since we have independent sections for header, key and payload, it will enable parsing each section with its own parser, eg., headers coming in as string and payload as json. KafkaInputFormat will be the uber class extending inputFormat interface and will be responsible for creating individual parsers for header, key and payload, blend the data resolving conflicts in columns and generating a single unified InputRow for Druid ingestion. "headerFormat" will allow users to plug parser type for the header values and will add default header prefix as "kafka.header."(can be overridden) for attributes to avoid collision while merging attributes with payload. Kafka payload parser will be responsible for parsing the Value portion of the Kafka record. This is where most of the data will come from and we should be able to plugin existing parser. One thing to note here is that if batching is performed, then the code is augmenting header and key values to every record in the batch. Kafka key parser will handle parsing Key portion of the Kafka record and will ingest the Key with dimension name as "kafka.key". ## KafkaInputFormat Class: This is the class that orchestrates sending the consumerRecord to each parser, retrieve rows, merge the columns into one final row for Druid consumption. KafkaInputformat should make sure to release the resources that gets allocated as a part of reader in CloseableIterator<InputRow> during normal and exception cases. During conflicts in dimension/metrics names, the code will prefer dimension names from payload and ignore the dimension either from headers/key. This is done so that existing input formats can be easily migrated to this new format without worrying about losing information.	2021-10-07 08:56:27 -07:00
Charles Smith	8fd17fe0af	fix a few typos in Kinesis doc (#11776 )	2021-10-06 19:43:20 -07:00
Frank Chen	104c9a07f0	Fix broken anchor and heading levels in Kafka/Kinesis ingestion (#11748 ) * Fix broken anchor and heading levels * Fix CI	2021-10-05 19:30:50 -07:00
Vaibhav	3c4bba1478	Update kinesis-ingestion.md (#11767 ) * Update kinesis-ingestion.md It seems that we are declaring (a final int) recordsPerFetch as 400 and fetchDelayMillis as 0 in https://github.com/implydata/druid/blob/imply-2021.09/extensions-core/kinesis-indexing-service/src/main/java/org/apache/druid/indexing/kinesis/KinesisIndexTaskIOConfig.java#L36 ``` public static final int DEFAULT_RECORDS_PER_FETCH = 4000; public static final int DEFAULT_FETCH_DELAY_MILLIS = 0; ``` updating `recordsPerFetch` and `fetchDelayMillis` to actual default values as hardcoded above . * Update docs/development/extensions-core/kinesis-ingestion.md Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-10-04 11:26:53 -07:00
Jihoon Son	7e90d00cc0	Configurable maxStreamLength for doubles sketches (#11574 ) * Configurable maxStreamLength for doubles sketches * fix equals/hashcode and it test failure * fix test * fix it test * benchmark * doc * grouping key * fix comment * dependency check * Update docs/development/extensions-core/datasketches-quantiles.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-08-31 14:56:37 -07:00
zhangyue19921010	6d14ea2d14	Dynamic auto scale Kinesis-Stream ingest tasks (#10985 ) * ready to test * revert misc.xml * document kinesis md * Update docs/development/extensions-core/kafka-ingestion.md * Update docs/development/extensions-core/kinesis-ingestion.md * Update docs/development/extensions-core/kinesis-ingestion.md * Update docs/development/extensions-core/kinesis-ingestion.md * Update docs/development/extensions-core/kinesis-ingestion.md * Update docs/development/extensions-core/kinesis-ingestion.md * Update docs/development/extensions-core/kinesis-ingestion.md * Update docs/development/extensions-core/kinesis-ingestion.md * Update docs/development/extensions-core/kinesis-ingestion.md * Update docs/development/extensions-core/kinesis-ingestion.md * Update docs/development/extensions-core/kinesis-ingestion.md * Update kafka-ingestion.md remove leading ` * Update kinesis-ingestion.md add missing ` Co-authored-by: yuezhang <yuezhang@freewheel.tv> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-08-30 15:44:29 -07:00
Charles Smith	9032a0b079	updates Kafka and Kinesis to use . Fixes some typos and other style i… (#11624 ) * updates Kafka and Kinesis to use . Fixes some typos and other style issues for Kafka. * fix spelling * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: Jihoon Son <jihoonson@apache.org> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: Jihoon Son <jihoonson@apache.org> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: Jihoon Son <jihoonson@apache.org> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: Jihoon Son <jihoonson@apache.org> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: Jihoon Son <jihoonson@apache.org> * Update docs/development/extensions-core/kinesis-ingestion.md Co-authored-by: Jihoon Son <jihoonson@apache.org> * Update docs/development/extensions-core/kinesis-ingestion.md Co-authored-by: Jihoon Son <jihoonson@apache.org> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: Jihoon Son <jihoonson@apache.org> * address comments Co-authored-by: Jihoon Son <jihoonson@apache.org>	2021-08-26 13:22:30 -07:00
Clint Wylie	ec334a641b	MySQL extension with MariaDB connector docs (#11608 ) * add docs for mariadb support via mysql extensions * add logging so you know what druid knows * homogenize * spelling * missed a couple	2021-08-19 01:52:26 -07:00
Karan Kumar	d1bad92880	Made the instructions of adding extra resources as part of extensions simpler (#11577 )	2021-08-17 17:33:55 +05:30
sthetland	95c5bc3a6d	Clarify when changes to credentialIterations take effect (#11590 ) This change updates doc to clarify when and how a change to druid.auth.authenticator.basic.credentialIterations takes effect: changes apply only to new users or existing users upon changing their password via the credentials API, which may not be the expectation.	2021-08-13 17:02:07 -07:00
Charles Smith	6524d838d7	Docs refactor of ingestion. Carries #11541 (#11576 ) * Docs refactor of ingestion. Carries #11541 * Update docs/misc/math-expr.md * add Apache license * fix header, add topics to sidebar * Update docs/ingestion/partitioning.md * pick up changes to and md from `c7fdf1d`, #11479 Co-authored-by: Suneet Saldanha <suneet@apache.org> Co-authored-by: Jihoon Son <jihoonson@apache.org>	2021-08-13 08:42:03 -07:00
Yi Yuan	59c8430d29	change document (#11545 ) Co-authored-by: yuanyi <yuanyi@freewheel.tv>	2021-08-06 07:57:12 -07:00
Peter Marshall	973e5bf7d0	Docs - HLL lgK tip and slight layout change (#11482 ) * HLL lgK and a tip Knowledge transfer from https://the-asf.slack.com/archives/CJ8D1JTB8/p1600699967024200. Attempted to make a connection between the SQL HLL function and the HLL underneath without getting too complicated. Also added a note about using K over 16 being pretty much pointless. * Corrected spelling * Create datasketches-hll.md Put roll-up back to rollup * Update docs/development/extensions-core/datasketches-hll.md Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2021-07-26 12:28:53 -07:00
Joseph Glanville	d5e8d4d680	Avro union support (#10505 ) * Avro union support * Document new union support * Add support for AvroStreamInputFormat and fix checkstyle * Extend multi-member union test schema and format * Some additional docs and add Enums to spelling * Rename explodeUnions -> extractUnions * explode -> extract * ByType * Correct spelling error	2021-07-06 22:05:41 -07:00
frank chen	906a704c55	Eliminate ambiguities of KB/MB/GB in the doc (#11333 ) * GB ---> GiB * suppress spelling check * MB --> MiB, KB --> KiB * Use IEC binary prefix * Add reference link * Fix doc style	2021-06-30 13:42:45 -07:00
Hoseung Lee	ed0a57e106	Update kafka-ingestion.md to clarify PasswordProvider support limitation (#11374 ) Co-authored-by: Clint Wylie <cjwylie@gmail.com> Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2021-06-24 21:54:48 -07:00
Yi Yuan	145cf9e5c3	fix document about input format (#11342 ) Co-authored-by: yuanyi <yuanyi@freewheel.tv>	2021-06-08 23:44:54 +08:00
frank chen	2ee7e31e5b	Fix syntax error (#11332 )	2021-06-07 22:35:02 -07:00
Yuanli Han	8647040f4d	Allow user to set group.id for Kafka ingestion task (#11147 ) * allow user to set group.id for Kafka ingestion task * fix test coverage by removing deprecated code and add doc * fix typo * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: frank chen <frankchen@apache.org> Co-authored-by: frank chen <frankchen@apache.org>	2021-05-09 11:56:19 +08:00
Jeet Patel	7139c60868	Change the `id` for `kubernetes` doc link to work (#11176 ) * Change the `id` for doc link to work * Added `druid-kubernetes-extensions` to the list	2021-04-28 10:12:28 -07:00
sthetland	fb6751fa45	Fix old broken link (#11048 ) * link check fixes * updated link target * Update aggregations.md * spelling error	2021-04-07 20:40:50 -07:00
Himanshu	a0d52c3def	k8s discovery module: fix issue for druid.host being more than 63chars not permitted as k8s resource label value (#10961 ) * k8s discovery module: fix issue for druid.host being more than 63chars not permitted as k8s resource label value * update doc * fix test	2021-04-07 17:45:28 -07:00
Jihoon Son	cfcebc40f6	Allow list for JDBC connection properties to address CVE-2021-26919 (#11047 ) * Allow list for JDBC connection properties to address CVE-2021-26919 * fix tests for java 11	2021-04-01 17:30:47 -07:00
Charles Smith	8544d29bc7	remove experimental from Kinesis with caveats (#10998 ) * remove experimental from Kinesis with caveats * add suggested known issue * spelling fixes	2021-03-29 13:57:58 -07:00
Parag Jain	2fdc313e4d	GCS lookup support (#11026 ) * GCS lookup support * checkstyle fix * review comments * review comments * remove unused import	2021-03-30 01:40:41 +05:30
Yi Yuan	36e86a2880	Add protobuf schema registry (#10839 ) * dd_protobuf_schema_registry * change licese * delete some annotation * nodify tests * delete extra exception * add licenses * add descriptor and protoMessageType in ProtobufInputRowParser for adopt to old version * seperate kafka-protobuf-provider * modify protobuf.md * refine protobuf.md * add config and header * bug fixed Co-authored-by: yuanyi <yuanyi@freewheel.tv>	2021-03-09 15:15:51 -08:00
zhangyue19921010	bddacbb1c3	Dynamic auto scale Kafka-Stream ingest tasks (#10524 ) * druid task auto scale based on kafka lag * fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig * druid task auto scale based on kafka lag * fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig * test dynamic auto scale done * auto scale tasks tested on prd cluster * auto scale tasks tested on prd cluster * modify code style to solve 29055.10 29055.9 29055.17 29055.18 29055.19 29055.20 * rename test fiel function * change codes and add docs based on capistrant reviewed * midify test docs * modify docs * modify docs * modify docs * merge from master * Extract the autoScale logic out of SeekableStreamSupervisor to minimize putting more stuff inside there && Make autoscaling algorithm configurable and scalable. * fix ci failed * revert msic.xml * add uts to test autoscaler create && scale out/in and kafka ingest with scale enable * add more uts * fix inner class check * add IT for kafka ingestion with autoscaler * add new IT in groups=kafka-index named testKafkaIndexDataWithWithAutoscaler * review change * code review * remove unused imports * fix NLP * fix docs and UTs * revert misc.xml * use jackson to build autoScaleConfig with default values * add uts * use jackson to init AutoScalerConfig in IOConfig instead of Map<> * autoscalerConfig interface and provide a defaultAutoScalerConfig * modify uts * modify docs * fix checkstyle * revert misc.xml * modify uts * reviewed code change * reviewed code change * code reviewed * code review * log changed * do StringUtils.encodeForFormat when create allocationExec * code review && limit taskCountMax to partitionNumbers * modify docs * code review Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-03-06 14:36:52 +05:30
Jihoon Son	16acd6686a	Remove stale 'namespace' config for JDBC lookups from doc (#10886 ) * Remove stale 'namespace' config for JDBC lookups from doc and web-console * revert webconsole change * address comments	2021-03-04 17:16:34 -08:00
Abhishek Agarwal	96d26e5338	Fix kinesis ingestion bugs (#10761 ) * add offsetFetchPeriod to kinesis ingestion doc * Remove jackson dependencies from extensions * Use fixed delay for lag collection * Metrics reset after finishing processing * comments * Broaden the list of exceptions to retry for * Unit tests * Add more tests * Refactoring * re-order metrics * Doc suggestions Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> * Add tests Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>	2021-02-05 02:49:58 -08:00
Himadri Singh	1c1b396eaa	AWS Web Identity / IRSA Support (#10541 ) * AWS Web Identity Support required for AWS IRSA * Update kinesis-ingestion.md * disabling coverage tests https://github.com/apache/druid/pull/10541#issuecomment-737558213 * exclude coverage * Update licenses.yaml	2021-01-25 18:44:02 +05:30
Abhishek Agarwal	f66fdbfa5d	add offsetFetchPeriod to kinesis ingestion doc (#10734 )	2021-01-08 14:19:26 -08:00
Himanshu	c7b1212a43	AWS RDS token based password provider (#9518 ) * refresh db pwd * aws iam token password provider * fix analyze-dependencies build * fix doc build * add ut for BasicDataSourceExt * more doc updates * more doc update * moving aws token password provider to new extension * remove duplicate changes * make all config inline * extension docs * refresh db password in SQL Firehose code path as well * add ut * fix build * add new extension to distribution * rds lib is not provided * fix license build * add version to license * change parent version to 0.19.0-snapshot * address review comments * fix core/ code coverage * Update server/src/main/java/org/apache/druid/metadata/BasicDataSourceExt.java Co-authored-by: Clint Wylie <cjwylie@gmail.com> * address review comments * fix spellchecker * remove inadvertant website file change Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2021-01-06 21:15:29 -08:00
Himanshu	d2e6240cac	k8s-int-test-build: zk-less druid cluster and http based segment/task managment (#10686 ) * zk-less druid cluster in k8s build * attempt to fix build and use http based remote task management * mm/router logs for debugging * add default account k8s role and binding for pod, configMap access * fix issue * change router port to 8088 for common readinessProbe * break build_run_k8s_cluster.sh into separate scripts * revert changes to K8sDruidNodeAnnouncer.java * k8s extension doc update * add license to new file * address review comments * do not try to load lookups at startup to improve cluster startup time	2021-01-05 18:51:47 -08:00
Charles Smith	797371598d	update syntax for golbal cached uri lookups (#10629 )	2020-12-24 09:49:01 -08:00
sthetland	6ae8059c09	cleaning up and fixing links (#10528 ) * cleaning up and fixing links * reverting local link * Update indexer.md * link checking * Fixing one more stale link for PostgreSQL	2020-12-17 13:37:43 -08:00
Himanshu	ac1882bf74	kubernetes based discovery druid extension to run Druid on K8S without Zookeeper (#10544 ) * honor zk enablement config in more places in druid code * kubernetes based discovery module * fix spotbugs check * fix intellij checks error * fix doc link to kubernetes.md from extension * make spellchecker happy * update license.yaml * fix dependency check errors * update extension coverage * UTs for BaseNodeRoleWatcher * fix forbidden-api check * update k8s module coverage ignores * add Bouncy Castle License being same as MIT License for license checking purposes * further update licenses.yaml * label/annotation pre-existence assumption * address review comment	2020-12-14 21:10:31 -08:00
Himanshu	be019760bb	document DynamicConfigProvider for kafka consumer properties (#10658 ) * document DynamicConfigProvider for kafka consumer properties * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: Jihoon Son <jihoonson@apache.org> * Update docs/development/extensions-core/kafka-ingestion.md * fix doc build Co-authored-by: Jihoon Son <jihoonson@apache.org>	2020-12-10 08:24:33 -08:00
zhangyue19921010	229b5f359f	Remove hard limitation that druid(after 0.15.0) only can consume Kafka version 0.11.x or better (#10551 ) * remove build in kafka consumer config : * modify druid docs of kafka indexing service * yuezhang * modify doc * modify docs * fix kafkaindexTaskTest.java * revert uncessary change * add more logs and modify docs * revert jdk version * modify docs * modify-kafka-version v2 * modify docs * modify docs * modify docs * modify docs * modify docs * done * remove useless import * change code and add UT Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2020-12-03 17:37:59 -08:00
sthetland	ba915b7f56	Security overview documentation (#10339 ) * initial file * initial file * security overview added * ldap added * spacing adjustments * nits * security graphics and doc review * Update docs/operations/security-overview.md Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com> * Update docs/operations/security-user-auth.md Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com> * Update docs/operations/security-overview.md Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com> * Update docs/operations/security-overview.md Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com> * updates frm review * review comments * finish up review and light edits * broken links * spell check Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>	2020-11-19 15:24:58 -08:00
Pierre Carrier	835b328851	docs/: use tuningConfig (#10540 )	2020-10-30 09:39:21 -05:00
Charles Smith	9c51047cc8	Document correlation between credential iterations and query latency (#10532 ) use link / heading instead of footnote	2020-10-29 12:47:24 -07:00
Joseph Glanville	7ce9ac4548	Fix Avro support in Web Console (#10232 ) * Fix Avro OCF detection prefix and run formation detection on raw input * Support Avro Fixed and Enum types correctly * Check Avro version byte in format detection * Add test for AvroOCFReader.sample Ensures that the Sampler doesn't receive raw input that it can't serialize into JSON. * Document Avro type handling * Add TS unit tests for guessInputFormat	2020-10-07 21:08:22 -07:00
Clint Wylie	b95bf444b2	add docs for kinesis lag metrics (#10435 )	2020-09-28 13:13:53 -07:00
Suneet Saldanha	0891b1f833	Add note about aggregations on floats (#10285 ) * Add note about aggreations on floats Floating point math is known to be unstable. Due to the way aggregators work across segments it's possible for the same query operating on the same data to produce slightly different results. The same problem exists with any aggregators that are not commutative since the merge order across segments is not guaranteed. * Also talk about doubles * Apply suggestions from code review	2020-08-17 13:29:57 -07:00
Joseph Glanville	f3023c6058	Fix formatting in druid-pac4j documentation (#10174 ) Superfluous column broke table formatting.	2020-07-12 18:51:42 -07:00
Antoine Huret	88d20a61a6	renamed authenticationChain to authenticatorChain (#10143 )	2020-07-08 19:58:21 -07:00
Gian Merlino	9587fc0b84	Fix documentation for Kinesis fetchThreads. (#10156 ) * Fix documentation for Kinesis fetchThreads The default was changed in #9819, but the documentation wasn't updated. * Add 'procs' to spelling.	2020-07-08 19:47:09 -07:00
Clint Wylie	c5540f46ed	fixes for ranger docs (#10109 )	2020-07-01 18:26:41 -07:00
Clint Wylie	477335abb4	update links datasketches.github.io to datasketches.apache.org (#10107 ) * update links datasketches.github.io to datasketches.apache.org * now with more apache * oops * oops	2020-07-01 14:56:17 -07:00
Lee Rhodes	7b4edc93fc	Update web address to datasketches.apache.org (#10096 )	2020-06-30 19:05:23 -07:00
sthetland	ce03f31a73	Clarifying workerThreads and a few other nits (#9804 ) * Update data-formats.md Per Suneet, "Since you're editing this file can you also fix the json on line 177 please - it's missing a comma after the }" * Light text cleanup * Removing discussion of sample data, since it's repeated in the data loading tutorial, and not immediately relevant here. * Clarifying accepted values for URI lookup * Update index.md * original quickstart full first pass * original quickstart full first pass * first pass all the way through * straggler * image touchups and finished old tutorial * a bit of finishing up * druid-caffeine-cache ext previously removed * Sample MaxDirectMemorySize value unrealistic * Review comments * fixing links * spell checking gymnastics * workerThreads desc slightly expanded * typo * Typo * Reversing Kafka config order * Changing order of configs for Kinesis * Trying this again: ioConfig then tuningConfig	2020-05-06 09:05:18 -07:00
Alexander Saydakov	844d626738	added number of bins parameter (#9436 ) * added number of bins parameter * addressed review points * test equals Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>	2020-05-04 16:53:09 -07:00
Jian Wang	85dfbb64cb	Update documention for metricCompression (#9811 )	2020-05-03 12:56:48 -07:00
Will Salisbury	cda9f41e69	s/S3/GCS/g (#9700 ) fix typo [ at least I hope this was a typo… ]	2020-04-14 18:39:54 -07:00
Himanshu	ca369e5768	druid-pac4j: add ability to use custom ssl trust store while talking to auth server (#9637 ) * druid-pac4j: add ability for custom ssl trust store for talking to auth server * fix nimbusds DefaultResourceRetriever name in comment	2020-04-10 18:01:59 -07:00
bolkedebruin	ab5ac7f890	Document possible vulnerabilities for the druid-ranger-security (#9649 ) * Document possible vulnerabilities for the druid-ranger-security In certain configurations the ranger plugin can expose vulnerabilities due to some of its dependencies having CVEs. * Spelling checker is a bit tight	2020-04-09 10:43:11 -07:00
bolkedebruin	2d99966933	Add Apache Ranger Authorization (#9579 )	2020-04-04 18:02:24 +02:00
Maytas Monsereenusorn	1852bf33ea	Add Integration Test for functionality of kinesis ingestion (#9576 ) * kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * fix kinesis timeout * Kinesis IT * Kinesis IT * fix checkstyle * Kinesis IT * address comments * fix checkstyle	2020-04-03 09:45:22 -07:00
Clint Wylie	bf85ea19b2	roaring bitmaps by default (#9548 ) * it is finally time * fix it * more docs * fix doc	2020-03-23 18:15:57 -07:00
Himanshu	5604ac7963	druid extension for OpenID Connect auth using pac4j lib (#8992 ) * druid pac4j security extension for OpenID Connect OAuth 2.0 authentication * update version in druid-pac4j pom * introducing unauthorized resource filter * authenticated but authorized /unified-webconsole.html * use httpReq.getRequestURI() for matching callback path * add documentation * minor doc addition * licesne file updates * make dependency analyze succeed * fix doc build * hopefully fixes doc build * hopefully fixes license check build * yet another try on fixing license build * revert unintentional changes to website folder * update version to 0.18.0-SNAPSHOT * check session and its expiry on each request * add crypto service * code for encrypting the cookie * update doc with cookiePassphrase * update license yaml * make sessionstore in Pac4jFilter private non static * make Pac4jFilter fields final * okta: use sha256 for hmac * remove incubating * add UTs for crypto util and session store impl * use standard charsets * add license header * remove unused file * add org.objenesis.objenesis to license.yaml * a bit of nit changes in CryptoService and embedding EncryptionResult for clarity * rename alg to cipherAlgName * take cipher alg name, mode and padding as input * add java doc for CryptoService and make it more understandable * another UT for CryptoService * cache pac4j Config * use generics clearly in Pac4jSessionStore * update cookiePassphrase doc to mention PasswordProvider * mark stuff Nullable where appropriate in Pac4jSessionStore * update doc to mention jdbc * add error log on reaching callback resource * javadoc for Pac4jCallbackResource * introduce NOOP_HTTP_ACTION_ADAPTER * add correct module name in license file * correct extensions folder name in licenses.yaml * replace druid-kubernetes-extensions to druid-pac4j * cache SecureRandom instance * rename UnauthorizedResourceFilter to AuthenticationOnlyResourceFilter	2020-03-23 18:15:45 -07:00
Chi Cao Minh	e7b3dd9cd1	Update to mysql connector 5.1.48 (#9514 )	2020-03-16 10:38:31 -07:00
Maytas Monsereenusorn	92fb83726b	Add support for optional aws credentials for s3 for ingestion (#9375 ) * Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion * Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion * Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion * fix build failure * fix failing build * fix failing build * Code cleanup * fix failing test * Removed CloudConfigProperties and make specific class for each cloudInputSource * Removed CloudConfigProperties and make specific class for each cloudInputSource * pass s3ConfigProperties for split * lazy init s3client * update docs * fix docs check * address comments * add ServerSideEncryptingAmazonS3.Builder * fix failing checkstyle * fix typo * wrap the ServerSideEncryptingAmazonS3.Builder in a provider * added java docs for S3InputSource constructor * added java docs for S3InputSource constructor * remove wrap the ServerSideEncryptingAmazonS3.Builder in a provider	2020-02-25 20:59:53 -08:00
zachjsh	d771b42ed1	Move Azure extension into Core (#9394 ) * Move Azure extension into Core Moving the azure extension into Core. * * Fix build failure * * Add The MIT License (MIT) to list of compatible licenses * * Address review comments * * change reference to contrib azure to core azure * * Fix spelling mistakes.	2020-02-25 17:49:16 -08:00
Chi Cao Minh	7fc99ee206	Add common optional dependencies for extensions (#9399 ) * Add common optional dependencies for extensions Include hadoop-aws and postgres JDBC connector jar to improve out-of-the-box experience for extensions. The mysql JDBC connector jar is not bundled as it is GPL. * Update docs * Fix typo	2020-02-25 00:04:00 -08:00
zachjsh	f707064bed	Add Azure config options for segment prefix and max listing length (#9356 ) * Add Azure config options for segment prefix and max listing length Added configuration options to allow the user to specify the prefix within the segment container to store the segment files. Also added a configuration option to allow the user to specify the maximum number of input files to stream for each iteration. * * Fix test failures * * Address review comments * * add dependency explicitly to pom * * update docs * * Address review comments * * Address review comments	2020-02-21 14:12:03 -08:00
Clint Wylie	c3ebb5eb65	variance aggregator support for double columns (#9076 ) * variance aggregator support for double column instead of casting to float * docs * everything in its right place * checkstyle * adjustments	2020-02-12 09:32:42 -08:00
Dusan Maric	ebd199da73	docs: extensions-core: mysql: fix MySQL connector library Maven Central URL (#9344 )	2020-02-10 21:54:46 -08:00
Clint Wylie	b55657cc26	fix protobuf extension packaging and docs (#9320 ) * fix protobuf extension packaging and docs * fix paths * Update protobuf.md * Update protobuf.md	2020-02-07 09:26:52 -08:00
sthetland	556a3861ed	Make docs on reset supervisor operation scarier (#9288 ) * Update kafka-ingestion.md Companion doc update to #9253, intended to make a supervisor reset scarier * Update kinesis-ingestion.md	2020-02-04 15:30:31 -08:00
Suneet Saldanha	93167188ea	Update docs for extensions (#9218 ) * Update docs for s3 and avro extensions * More doc updates - google + cleanup	2020-01-19 12:49:33 -08:00
Jihoon Son	153495068b	Doc update for the new input source and the new input format (#9171 ) * Doc update for new input source and input format. - The input source and input format are promoted in all docs under docs/ingestion - All input sources including core extension ones are located in docs/ingestion/native-batch.md - All input formats and parsers including core extension ones are localted in docs/ingestion/data-formats.md - New behavior of the parallel task with different partitionsSpecs are documented in docs/ingestion/native-batch.md * parquet * add warning for range partitioning with sequential mode * hdfs + s3, gs * add fs impl for gs * address comments * address comments * gcs	2020-01-17 15:52:05 -08:00
singh	936b9bdfd0	add deets about the keyfile (#9209 )	2020-01-17 11:24:49 -08:00
Suneet Saldanha	92ac22d060	Link javaOpts to middlemanager runtime.properties docs (#9101 ) * Link javaOpts to middlemanager runtime.properties docs * fix broken link * reword config links	2020-01-15 21:22:49 -08:00
Suneet Saldanha	85a3d416b0	Tutorials use new ingestion spec where possible (#9155 ) * Tutorials use new ingestion spec where possible There are 2 main changes * Use task type index_parallel instead of index * Remove the use of parser + firehose in favor of inputFormat + inputSource index_parallel is the preferred method starting in 0.17. Setting the job to index_parallel with the default maxNumConcurrentSubTasks(1) is the equivalent of an index task Instead of using a parserSpec, dimensionSpec and timestampSpec have been promoted to the dataSchema. The format is described in the ioConfig as the inputFormat. There are a few cases where the new format is not supported * Hadoop must use firehoses instead of the inputSource and inputFormat * There is no equivalent of a combining firehose as an inputSource * A Combining firehose does not support index_parallel * fix typo	2020-01-15 14:08:29 -08:00
Jonathan Wei	d1500c1328	Update Kinesis resharding information about task failures (#9104 )	2020-01-07 15:44:48 -08:00
Jonathan Wei	aa539177ec	De-incubation cleanup in code, docs, packaging (#9108 ) * De-incubation cleanup in code, docs, packaging * remove unused docs script	2020-01-03 12:33:19 -05:00
Clint Wylie	5ecdf94d83	add 'prefixes' support to google input source (#8930 ) * add prefixes support to google input source, making it symmetrical-ish with s3 * docs * more better, and tests * unused * formatting * javadoc * dependencies * oops * review comments * better javadoc	2019-12-04 21:01:10 -08:00
Jonathan Wei	00ce18a0ea	Additional Kinesis resharding fixes (#8870 ) * Additional Kinesis resharding fixes * Address PR comments * Remove unused method * Adjust SegmentTransactionalInsertAction null handling * Check for unchanged metadata on empty publish * Add logs for empty publish * Fix javadoc * Clear offset when invalid endOffsets are seen * Fix LGTM alert * Fix build * Add resharding note to Kinesis docs * Checkstyle * Spelling * Address PR comments * Checkstyle	2019-11-28 12:59:01 -08:00
Clint Wylie	4458113375	S3 input source (#8903 ) * add s3 input source for native batch ingestion * add docs * fixes * checkstyle * lazy splits * fixes and hella tests * fix it * re-use better iterator * use key * javadoc and checkstyle * exception * oops * refactor to use S3Coords instead of URI * remove unused code, add retrying stream to handle s3 stream * remove unused parameter * update to latest master * use list of objects instead of object * serde test * refactor and such * now with the ability to compile * fix signature and javadocs * fix conflicts yet again, fix S3 uri stuffs * more tests, enforce uri for bucket * javadoc * oops * abstract class instead of interface * null or empty * better error	2019-11-25 22:31:19 -08:00
Clint Wylie	7250010388	add parquet support to native batch (#8883 ) * add parquet support to native batch * cleanup * implement toJson for sampler support * better binaryAsString test * docs * i hate spellcheck * refactor toMap conversion so can be shared through flattenerMaker, default impls should be good enough for orc+avro, fixup for merge with latest * add comment, fix some stuff * adjustments * fix accident * tweaks	2019-11-22 10:49:16 -08:00
Clint Wylie	d67c3c7aed	document SQL compatible null handling mode (#8894 ) * document SQL compatible null handling mode * adjustments * fix docs * review changes	2019-11-20 06:52:20 -08:00
Clint Wylie	074a45219d	add google cloud storage InputSource for native batch (#8907 ) * add google cloud storage InputSource for native batch * rename * checkstyle * fix * fix spelling * review comments	2019-11-19 19:49:43 -08:00
Chi Cao Minh	d60978343a	Improve missing JDBC driver error for lookups (#8872 ) If the JDBC drivers are missing from the lookup extensions, throw an exception that directs the user how to resolve the issue. This change is a follow up to #8825.	2019-11-18 11:42:38 -08:00
fst0	80dbf44fca	Add reference to druid.storage.type (#8857 ) * Add reference to `druid.storage.type` This should be in here. Without setting storage type to S3 globally it will obviously not be used, even if all other parameters are correct. * Update s3.md Add global storage parameter to knob table. * Update s3.md	2019-11-13 10:03:41 -08:00
Lucas Capistrant	a066cc5648	Fix groupMapping endpoint URIs in druid-basic-security doc (#8847 )	2019-11-12 21:12:34 +05:30
Jad Naous	ce3c0dae4d	Add note on JDBC libs for lookups (#8825 ) * Add note on JDBC libs for lookups * Fix directory and additional "the"	2019-11-06 13:31:26 -08:00
Giuseppe Martino	9c171e2b1f	Message rejection absolute date (#8656 ) * Add option lateMessageRejectionStartDate * Use option lateMessageRejectionStartDate * Fix tests * Add lateMessageRejectionStartDate to kafka indexing service * Update tests kafka indexing service * Fix tests for KafkaSupervisorTest * Add lateMessageRejectionStartDate to KinesisSupervisorIOConfig * Fix var name * Update documentation * Add check lateMessageRejectionStartDateTime and lateMessageRejectionPeriod, fails if both were specified.	2019-10-31 15:13:02 -07:00
Gian Merlino	7605c23354	Remove Tranquility configs and certain doc references. (#8793 ) Since it hasn't received updates or community interest in a while, it makes sense to de-emphasize it in the distribution and most documentation (outside of simple mentions of its existence).	2019-10-30 16:30:16 -07:00
Gian Merlino	aa81253cf4	Fix typos. (#8767 )	2019-10-28 12:47:01 -07:00

1 2 3 4

160 Commits