druid

Commit Graph

Author	SHA1	Message	Date
Alberic Liu	811dcd1726	update protobuf.md (#16434 )	2024-05-11 17:52:54 +08:00
Alberic Liu	92fb0ff718	upgrade mysql:mysql-connector-java to 8.2.0 (#16024 ) * upgrade mysql:mysql-connector-java to 8.2.0 * fix the check errors * remove unused comment	2024-05-06 21:58:37 +08:00
Kashif Faraz	51104e8bb3	Docs: Remove references to Zk-based segment loading (#16360 ) Follow up to #15705 Changes: - Remove references to ZK-based segment loading in the docs - Fix doc for existing config `druid.coordinator.loadqueuepeon.http.repeatDelay`	2024-05-01 08:06:00 +05:30
Nikhil Rao	a805c5612e	Adds Druid SQL query examples for the Stats aggregator Native Queries (#16277 ) * Adds Druid SQL query examples for the Timeseries and GroupBy Native queries in the stats aggregator docs page * Updates intervals in Native Query to remove excess Time part in timestamp * Moves Druid SQL section above Native query because sql used more often by users * removes old Druid SQL sections * Adds TopN Druid SQL query using ORDER BY and LIMIT * Adds table for Druid SQL variance and standard deviation functions * Update docs/development/extensions-core/stats.md Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com> --------- Co-authored-by: Karan Kumar <karankumar1100@gmail.com> Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>	2024-04-15 08:05:34 -07:00
317brian	03c191f701	docs: clarify description of uri/uriprefix (#16110 ) * docs: clarify description of uri/uripath * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2024-03-13 11:52:01 -07:00
zachjsh	720f1e834a	Add support for AzureDNSZone enabled storage accounts used for deep storage (#16016 ) * Add support for AzureDNSZone enabled storage accounts used for deep storage Added a new config to AzureAccountConfig `storageAccountEndpointSuffix` which allows the user to specify a storage account endpoint suffix where the underlying storage account is enabled for AzureDNSZone. The previous config `endpointSuffix`, did not allow support for such accounts. The previous config has been deprecated in favor of this new config. Also fixed an issue where `managedIdentityClientId` was not being set properly * * address review comments * * add back azure government link and docs	2024-03-04 16:13:28 -05:00
317brian	3df161f73c	docs: update security doc for hashing (#15970 ) * docs: add mermaid diagram support * docs: update druid-basic-security doc to mention caching * Update docs/development/extensions-core/druid-basic-security.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2024-02-28 09:48:37 +08:00
Katya Macedo	0f29ece6a9	[Docs] Refactor streaming ingestion section (#15591 ) Merging the work so far. @ektravel , @vogievetsky if there are additional improvements, let's track them & make another pr. * Refactor streaming ingestion docs * Update property definition * Update after review * Update known issues * Move kinesis and kafka topics to ingestion, add redirects * Saving changes * Saving * Add input format text * Update after review * Minor text edit * Update example syntax * Revert back to colon * Fix merge conflicts * Fix broken links * Fix spelling error	2024-02-12 13:52:42 -08:00
Pramod Immaneni	59bca0951a	Parallelize storage of incremental segments (#13982 ) During ingestion, incremental segments are created in memory for the different time chunks and persisted to disk when certain thresholds are reached (max number of rows, max memory, incremental persist period etc). In the case where there are a lot of dimension and metrics (1000+) it was observed that the creation/serialization of incremental segment file format for persistence and persisting the file took a while and it was blocking ingestion of new data. This affected the real-time ingestion. This serialization and persistence can be parallelized across the different time chunks. This update aims to do that. The patch adds a simple configuration parameter to the ingestion tuning configuration to specify number of persistence threads. The default value is 1 if it not specified which makes it the same as it is today.	2024-02-07 10:43:05 +05:30
zachjsh	9d4e8053a4	Kinesis adaptive memory management (#15360 ) ### Description Our Kinesis consumer works by using the [GetRecords API](https://docs.aws.amazon.com/kinesis/latest/APIReference/API_GetRecords.html) in some number of `fetchThreads`, each fetching some number of records (`recordsPerFetch`) and each inserting into a shared buffer that can hold a `recordBufferSize` number of records. The logic is described in our documentation at: https://druid.apache.org/docs/27.0.0/development/extensions-core/kinesis-ingestion/#determine-fetch-settings There is a problem with the logic that this pr fixes: the memory limits rely on a hard-coded “estimated record size” that is `10 KB` if `deaggregate: false` and `1 MB` if `deaggregate: true`. There have been cases where a supervisor had `deaggregate: true` set even though it wasn’t needed, leading to under-utilization of memory and poor ingestion performance. Users don’t always know if their records are aggregated or not. Also, even if they could figure it out, it’s better to not have to. So we’d like to eliminate the `deaggregate` parameter, which means we need to do memory management more adaptively based on the actual record sizes. We take advantage of the fact that GetRecords doesn’t return more than 10MB (https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html ): This pr: eliminates `recordsPerFetch`, always use the max limit of 10000 records (the default limit if not set) eliminate `deaggregate`, always have it true cap `fetchThreads` to ensure that if each fetch returns the max (`10MB`) then we don't exceed our budget (`100MB` or `5% of heap`). In practice this means `fetchThreads` will never be more than `10`. Tasks usually don't have that many processors available to them anyway, so in practice I don't think this will change the number of threads for too many deployments add `recordBufferSizeBytes` as a bytes-based limit rather than records-based limit for the shared queue. We do know the byte size of kinesis records by at this point. Default should be `100MB` or `10% of heap`, whichever is smaller. add `maxBytesPerPoll` as a bytes-based limit for how much data we poll from shared buffer at a time. Default is `1000000` bytes. deprecate `recordBufferSize`, use `recordBufferSizeBytes` instead. Warning is logged if `recordBufferSize` is specified deprecate `maxRecordsPerPoll`, use `maxBytesPerPoll` instead. Warning is logged if maxRecordsPerPoll` is specified Fixed issue that when the record buffer is full, the fetchRecords logic throws away the rest of the GetRecords result after `recordBufferOfferTimeout` and starts a new shard iterator. This seems excessively churny. Instead, wait an unbounded amount of time for queue to stop being full. If the queue remains full, we’ll end up right back waiting for it after the restarted fetch. There was also a call to `newQ::offer` without check in `filterBufferAndResetBackgroundFetch`, which seemed like it could cause data loss. Now checking return value here, and failing if false. ### Release Note Kinesis ingestion memory tuning config has been greatly simplified, and a more adaptive approach is now taken for the configuration. Here is a summary of the changes made: eliminates `recordsPerFetch`, always use the max limit of 10000 records (the default limit if not set) eliminate `deaggregate`, always have it true cap `fetchThreads` to ensure that if each fetch returns the max (`10MB`) then we don't exceed our budget (`100MB` or `5% of heap`). In practice this means `fetchThreads` will never be more than `10`. Tasks usually don't have that many processors available to them anyway, so in practice I don't think this will change the number of threads for too many deployments add `recordBufferSizeBytes` as a bytes-based limit rather than records-based limit for the shared queue. We do know the byte size of kinesis records by at this point. Default should be `100MB` or `10% of heap`, whichever is smaller. add `maxBytesPerPoll` as a bytes-based limit for how much data we poll from shared buffer at a time. Default is `1000000` bytes. deprecate `recordBufferSize`, use `recordBufferSizeBytes` instead. Warning is logged if `recordBufferSize` is specified deprecate `maxRecordsPerPoll`, use `maxBytesPerPoll` instead. Warning is logged if maxRecordsPerPoll` is specified	2024-01-19 14:30:21 -05:00
Gian Merlino	cccf13ea82	Reverse, pull up lookups in the SQL planner. (#15626 ) * Reverse, pull up lookups in the SQL planner. Adds two new rules: 1) ReverseLookupRule, which eliminates calls to LOOKUP by doing reverse lookups. 2) AggregatePullUpLookupRule, which pulls up calls to LOOKUP above GROUP BY, when the lookup is injective. Adds configs `sqlReverseLookup` and `sqlPullUpLookup` to control whether these rules fire. Both are enabled by default. To minimize the chance of performance problems due to many keys mapping to the same value, ReverseLookupRule refrains from reversing a lookup if there are more keys than `inSubQueryThreshold`. The rationale for using this setting is that reversal works by generating an IN, and the `inSubQueryThreshold` describes the largest IN the user wants the planner to create. * Add additional line. * Style. * Remove commented-out lines. * Fix tests. * Add test. * Fix doc link. * Fix docs. * Add one more test. * Fix tests. * Logic, test updates. * - Make FilterDecomposeConcatRule more flexible. - Make CalciteRulesManager apply reduction rules til fixpoint. * Additional tests, simplify code.	2024-01-12 00:06:31 -08:00
Misha	ea6ba40ce1	Add support for Azure Goverment storage (#15523 ) Added support for Azure Government storage in Druid Azure-Extensions. This enhancement allows the Azure-Extensions to be compatible with different Azure storage types by updating the endpoint suffix from a hardcoded value to a configurable one.	2024-01-09 22:33:32 +05:30
George Shiqi Wu	8e95cea8e5	Azure client upgrade to allow identity options (#15287 ) * Include new dependencies * Mostly implemented * More azure fixes * Tests passing * Unit tests running * Test running after removing storage exception * Happy with coverage now * Add more tests * fix client factory * cleanup from testing * Remove old client * update docs * Exclude from spellcheck * Add licenses * Fix identity version * Save work * Add azure clients * add licenses * typos * Add dependencies * Exception is not thrown * Fix intellij check * Don't need to override * specify length * urldecode * encode path * Fix checks * Revert urlencode changes * Urlencode with azure library * Update docs/development/extensions-core/azure.md Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * PR changes * Update docs/development/extensions-core/azure.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Deprecate AzureTaskLogsConfig.maxRetries * Clean up azure retry block * logic update to reuse clients * fix comments * Create container conditionally * Fix key auth * Remove container client logic * Add some more testing * Update comments * Add a comment explaining client reuse * Move logic to factory class * use bom for dependency management * fix license versions --------- Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-01-03 18:36:05 -05:00
Jan Werner	f4856bc1c1	ranger-security: exclude jackson-jaxrs from + fix outdated documentation (#15481 ) * Excluding jackson-jaxrs dependency from ranger-plugin-common to address CVE regression introduced by ranger-upgrade: CVE-2019-10202, CVE-2019-10172 * remove the reference to outdated ranger 2.0 from the docs --------- Co-authored-by: Xavier Léauté <xl+github@xvrl.net>	2023-12-05 08:24:37 -08:00
Jill Osborne	3fa856b3ff	Update Kinesis resharding doc (#15401 )	2023-11-20 15:40:59 -08:00
317brian	dfc52994d4	docs: fix code tabs (#15403 )	2023-11-20 11:16:10 -08:00
Charles Smith	e7d0429f5b	docs: suggest metadata store with instant ADD COLUMN semantics (#15334 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-11-09 12:56:30 -08:00
Pranav	e2fde8c516	Refactor lookups behavior while loading/dropping the containers (#14806 )	2023-11-07 10:07:28 -08:00
Pranav	c7d0615af3	Fix the build for #15013.: Lookup jitter upstream build fix (#15103 ) Fix the build for #15013.	2023-10-09 09:35:39 +05:30
Abhishek Agarwal	3c7b237c22	Add docs for ingesting Kafka topic name (#14894 ) Add documentation on how to extract the Kafka topic name and ingest it into the data.	2023-08-24 19:19:59 +05:30
Katya Macedo	5f74ef56f1	Clean up Kafka supervisor topic (#14651 ) Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-08-21 11:55:38 -07:00
Abhishek Radhakrishnan	37db5d9b81	Reset offsets supervisor API (#14772 ) * Add supervisor /resetOffsets API. - Add a new endpoint /druid/indexer/v1/supervisor/<supervisorId>/resetOffsets which accepts DataSourceMetadata as a body parameter. - Update logs, unit tests and docs. * Add a new interface method for backwards compatibility. * Rename * Adjust tests and javadocs. * Use CoreInjectorBuilder instead of deprecated makeInjectorWithModules * UT fix * Doc updates. * remove extraneous debugging logs. * Remove the boolean setting; only ResetHandle() and resetInternal() * Relax constraints and add a new ResetOffsetsNotice; cleanup old logic. * A separate ResetOffsetsNotice and some cleanup. * Minor cleanup * Add a check & test to verify that sequence numbers are only of type SeekableStreamEndSequenceNumbers * Add unit tests for the no op implementations for test coverage * CodeQL fix * checkstyle from merge conflict * Doc changes * DOCUSAURUS code tabs fix. Thanks, Brian!	2023-08-17 14:13:10 -07:00
Abhishek Agarwal	b97cc45d81	Add clarification to the docs for multi-topic Kafka ingestion (#14847 ) Follow-up to #14828. Added some more clarification about how topicPattern is used.	2023-08-17 12:52:06 +05:30
317brian	6b4dda964d	Docusaurus2 upgrade for master (#14411 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-08-16 19:01:21 -07:00
Abhishek Agarwal	7911a04064	Refactoring of multi-topic kafka ingestion docs (#14828 ) In this PR, I have gotten rid of multiTopic parameter and instead added a topicPattern parameter. Kafka supervisor will pass topicPattern or topic as the stream name to the core ingestion engine. There is validation to ensure that only one of topic or topicPattern will be set. This new setting is easier to understand than overloading the topic field that earlier could be interpreted differently depending on the value of some other field.	2023-08-16 18:00:11 +05:30
Abhishek Agarwal	30b5dd4ca7	Add support to read from multiple kafka topics in same supervisor (#14424 ) This PR adds support to read from multiple Kafka topics in the same supervisor. A multi-topic ingestion can be useful in scenarios where a cluster admin has no control over input streams. Different teams in an org may create different input topics that they can write the data to. However, the cluster admin wants all this data to be queryable in one data source.	2023-08-14 22:24:49 +05:30
Tejaswini Bandlamudi	a45b25fa1d	Removes support for Hadoop 2 (#14763 ) Removing Hadoop 2 support as discussed in https://lists.apache.org/list?dev@druid.apache.org:lte=1M:hadoop	2023-08-09 17:47:52 +05:30
Abhishek Radhakrishnan	bff8f9e12e	Update kinesis docs (#14768 ) Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2023-08-07 17:08:34 -07:00
Gian Merlino	5387f1bac0	Remove chatAsync parameter, so chat is always async. (#14692 ) * Remove chatAsync parameter, so chat is always async. chatAsync has been made default in Druid 26. I have seen good battle-testing of it in production, and am comfortable removing the older sync client. This was the last remaining usage of IndexTaskClient, so this patch deletes all that stuff too. * Remove unthrown exception. * Remove unthrown exception. * No more TimeoutException.	2023-07-31 19:42:51 -07:00
Katya Macedo	4804630c78	Clean up Kinesis doc (#14529 )	2023-07-25 19:24:36 -07:00
Jaehui Lee	1f4ee5e21b	Docs: Change default value of "maxRowsInMemory" in tuningConfig (#14618 ) Reflecting fixes from https://github.com/apache/druid/pull/13939	2023-07-19 23:14:15 +05:30
Gian Merlino	95ca43034f	Change default handoffConditionTimeout to 15 minutes. (#14539 ) * Change default handoffConditionTimeout to 15 minutes. Most of the time, when handoff is taking this long, it's because something is preventing Historicals from loading new data. In this case, we have two choices: 1) Stop making progress on ingestion, wait for Historicals to load stuff, and keep the waiting-for-handoff segments available on realtime tasks. (handoffConditionTimeout = 0, the current default) 2) Continue making progress on ingestion, by exiting the realtime tasks that were waiting for handoff. Once the Historicals get their act together, the segments will be loaded, as they are still there on deep storage. They will just not be continuously available. (handoffConditionTimeout > 0) I believe most users would prefer [2], because [1] risks ingestion falling behind the stream, which causes many other problems. It can cause data loss if the stream ages-out data before we have a chance to ingest it. Due to the way tuningConfigs are serialized -- defaults are baked into the serialized form that is written to the database -- this default change will not change anyone's existing supervisors. It will take effect for newly created supervisors. * Fix tests. * Update docs/development/extensions-core/kafka-supervisor-reference.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/kinesis-ingestion.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> --------- Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2023-07-13 13:17:14 -07:00
Gian Merlino	63ee69b4e8	Claim full support for Java 17. (#14384 ) * Claim full support for Java 17. No production code has changed, except the startup scripts. Changes: 1) Allow Java 17 without DRUID_SKIP_JAVA_CHECK. 2) Include the full list of opens and exports on both Java 11 and 17. 3) Document that Java 17 is both supported and preferred. 4) Switch some tests from Java 11 to 17 to get better coverage on the preferred version. * Doc update. * Update errorprone. * Update docker_build_containers.sh. * Update errorprone in licenses.yaml. * Add some more run-javas. * Additional run-javas. * Update errorprone. * Suppress new errorprone error. * Add exports and opens in ForkingTaskRunner for Java 11+. Test, doc changes. * Additional errorprone updates. * Update for errorprone. * Restore old fomatting in LdapCredentialsValidator. * Copy bin/ too. * Fix Java 15, 17 build line in docker_build_containers.sh. * Update busybox image. * One more java command. * Fix interpolation. * IT commandline refinements. * Switch to busybox 1.34.1-glibc. * POM adjustments, build and test one IT on 17. * Additional debugging. * Fix silly thing. * Adjust command line. * Add exports and opens one more place. * Additional harmonization of strong encapsulation parameters.	2023-07-07 12:52:35 -07:00
Nhi Pham	579b93f282	API reference refactor (#14372 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-06-26 15:48:54 -07:00
Andreas Maechler	55effd92cf	Docs: Typo and language cleanup in Kinesis ingestion docs (#14356 ) Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-06-02 08:18:41 +05:30
Pramod Immaneni	1ac5544da7	Updated default value of maxTotalRows to reflect the value in the code (#14298 )	2023-05-30 14:41:06 +05:30
Victoria Lim	6b3a6113c4	Doc: List supported values for Kafka `headerFormat` (#14316 )	2023-05-22 15:41:07 -07:00
Katya Macedo	269137c682	Update Ingestion section (#14023 ) Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Victoria Lim <lim.t.victoria@gmail.com>	2023-05-19 09:42:27 -07:00
317brian	6254658f61	docs: fix links (#14111 )	2023-05-12 09:59:16 -07:00
TSFenwick	6c99fbea92	fix typo in s3 docs. add readme to s3 module. (#14135 ) * fix typo in s3 docs. add readme to s3 module. * Update extensions-core/s3-extensions/README.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * cleanup readme for s3 extension and link to repo markdown doc instead of web docs --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-04-26 14:03:11 -07:00
Vadim Ogievetsky	3a7e4efdd6	Docs: updating Kafka input format docs (#14049 ) * updating Kafka input format docs * typo * spellcheck * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-04-11 20:06:23 -07:00
Vadim Ogievetsky	5ee4ecee62	Web console: use new sampler features (#14017 ) * use new sampler features * supprot kafka format * update DQT, fix tests * prefer non numeric formats * fix input format step * boost SQL data loader * delete dimension in auto discover mode * inline example specs * feedback updates * yeet the format into valueFormat when switching to kafka * kafka format is now a toggle * even better form layout * rename	2023-04-07 06:28:29 -07:00
Tejaswini Bandlamudi	ccf48245d7	Update documentation for Kafka Supervisor IdleConfig (#14032 )	2023-04-05 21:55:39 +05:30
Rishabh Singh	e8e8082573	Update OIDCConfig with scope information (#13973 ) Allow users to provide custom scope through OIDC configuration	2023-03-28 14:50:00 +05:30
Atul Mohan	19db32d6b4	Add JWT authenticator support for validating ID Tokens (#13242 ) Expands the OIDC based auth in Druid by adding a JWT Authenticator that validates ID Tokens associated with a request. The existing pac4j authenticator works for authenticating web users while accessing the console, whereas this authenticator is for validating Druid API requests made by Direct clients. Services already supporting OIDC can attach their ID tokens to the Druid requests under the Authorization request header.	2023-03-25 18:41:40 +05:30
Jill Osborne	976d39281f	Fix some broken links in docs (#13968 )	2023-03-24 10:48:23 -07:00
Gian Merlino	fe9d0c46d5	Improve memory efficiency of WrappedRoaringBitmap. (#13889 ) * Improve memory efficiency of WrappedRoaringBitmap. Two changes: 1) Use an int[] for sizes 4 or below. 2) Remove the boolean compressRunOnSerialization. Doesn't save much space, but it does save a little, and it isn't adding a ton of value to have it be configurable. It was originally configurable in case anything broke when enabling it, but it's been a while and nothing has broken. * Slight adjustment. * Adjust for inspection. * Updates. * Update snaps. * Update test. * Adjust test. * Fix snaps.	2023-03-09 15:48:02 -08:00
Anshu Makkar	a10e4150d5	Add Post Aggregators for Tuple Sketches (#13819 ) You can now do the following operations with TupleSketches in Post Aggregation Step Get the Sketch Output as Base64 String Provide a constant Tuple Sketch in post-aggregation step that can be used in Set Operations Get the Estimated Value(Sum) of Summary/Metrics Objects associated with Tuple Sketch	2023-03-03 09:32:09 +05:30
317brian	b4b354b658	docs: fix html nits (#13835 )	2023-03-02 11:19:32 -08:00
Tejaswini Bandlamudi	7103cb4b9d	Removes FiniteFirehoseFactory and its implementations (#12852 ) The FiniteFirehoseFactory and InputRowParser classes were deprecated in 0.17.0 (#8823) in favor of InputSource & InputFormat. This PR removes the FiniteFirehoseFactory and all its implementations along with classes solely used by them like Fetcher (Used by PrefetchableTextFilesFirehoseFactory). Refactors classes including tests using FiniteFirehoseFactory to use InputSource instead. Removing InputRowParser may not be as trivial as many classes that aren't deprecated depends on it (with no alternatives), like EventReceiverFirehoseFactory. Hence FirehoseFactory, EventReceiverFirehoseFactory, and Firehose are marked deprecated.	2023-03-02 18:07:17 +05:30

1 2 3 4

192 Commits