druid

Commit Graph

Author	SHA1	Message	Date
Vadim Ogievetsky	5ee4ecee62	Web console: use new sampler features (#14017 ) * use new sampler features * supprot kafka format * update DQT, fix tests * prefer non numeric formats * fix input format step * boost SQL data loader * delete dimension in auto discover mode * inline example specs * feedback updates * yeet the format into valueFormat when switching to kafka * kafka format is now a toggle * even better form layout * rename	2023-04-07 06:28:29 -07:00
Clint Wylie	e3211e3be0	actually backwards compatible frontCoded string encoding strategy (#13996 )	2023-03-31 02:24:12 -07:00
Clint Wylie	2219e68fa3	add backwards compat mode for frontCoded stringEncodingStrategy (#13988 )	2023-03-28 14:44:44 -07:00
Jill Osborne	976d39281f	Fix some broken links in docs (#13968 )	2023-03-24 10:48:23 -07:00
Gian Merlino	fe9d0c46d5	Improve memory efficiency of WrappedRoaringBitmap. (#13889 ) * Improve memory efficiency of WrappedRoaringBitmap. Two changes: 1) Use an int[] for sizes 4 or below. 2) Remove the boolean compressRunOnSerialization. Doesn't save much space, but it does save a little, and it isn't adding a ton of value to have it be configurable. It was originally configurable in case anything broke when enabling it, but it's been a while and nothing has broken. * Slight adjustment. * Adjust for inspection. * Updates. * Update snaps. * Update test. * Adjust test. * Fix snaps.	2023-03-09 15:48:02 -08:00
317brian	b4b354b658	docs: fix html nits (#13835 )	2023-03-02 11:19:32 -08:00
Jill Osborne	26c5cac41a	Fix a link problem (#13876 )	2023-03-02 09:09:51 -08:00
Tejaswini Bandlamudi	7103cb4b9d	Removes FiniteFirehoseFactory and its implementations (#12852 ) The FiniteFirehoseFactory and InputRowParser classes were deprecated in 0.17.0 (#8823) in favor of InputSource & InputFormat. This PR removes the FiniteFirehoseFactory and all its implementations along with classes solely used by them like Fetcher (Used by PrefetchableTextFilesFirehoseFactory). Refactors classes including tests using FiniteFirehoseFactory to use InputSource instead. Removing InputRowParser may not be as trivial as many classes that aren't deprecated depends on it (with no alternatives), like EventReceiverFirehoseFactory. Hence FirehoseFactory, EventReceiverFirehoseFactory, and Firehose are marked deprecated.	2023-03-02 18:07:17 +05:30
Katya Macedo	bc8b710b7e	Fix broken link (#13767 )	2023-02-17 09:02:12 -08:00
AmatyaAvadhanula	0cf1fc3d55	Indexing on multiple disks (#13476 ) * Initial commit * Simple UTs * Parameterize tests * Parameterized tests for k8s task runner * Fix restore bug * Refactor TaskStorageDirTracker * Change CliPeon args	2023-02-08 11:31:34 +05:30
Victoria Lim	00cee329bd	pitfall when using combining input source (#13639 )	2023-01-25 12:50:19 -08:00
Kashif Faraz	d6949b1b79	Track input processedBytes with MSQ ingestion (#13559 ) Follow up to #13520 Bytes processed are currently tracked for intermediate stages in MSQ ingestion. This patch adds the capability to track the bytes processed by an MSQ controller task while reading from an external input source or a segment source. Changes: - Track `processedBytes` for every `InputSource` read in `ExternalInputSliceReader` - Update `ChannelCounters` with the above obtained `processedBytes` when incrementing the input file count. - Update task report structure in docs The total input processed bytes can be obtained by summing the `processedBytes` as follows: totalBytes = 0 for every root stage (i.e. a stage which does not have another stage as an input): for every worker in that stage: for every input channel: (i.e. channels with prefix "input", e.g. "input0", "input1", etc.) totalBytes += processedBytes	2022-12-16 02:20:01 +05:30
Jill Osborne	b56855b837	Update to native ingestion doc (#13482 ) * Update to native ingestion doc * Update docs/ingestion/native-batch.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update native-batch.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2022-12-07 15:08:19 +05:30
Kashif Faraz	c7229fc787	Limit max batch size for segment allocation, add docs (#13503 ) Changes: - Limit max batch size in `SegmentAllocationQueue` to 500 - Rename `batchAllocationMaxWaitTime` to `batchAllocationWaitTime` since the actual wait time may exceed this configured value. - Replace usage of `SegmentInsertAction` in `TaskToolbox` with `SegmentTransactionalInsertAction`	2022-12-07 10:07:14 +05:30
Jill Osborne	138a6de507	Update nested columns docs (#13461 ) * Update nested columns docs (cherry picked from commit `04206c5179`) * Update nested-columns.md (cherry picked from commit `8085ee7217`)	2022-12-01 10:47:32 -08:00
Jill Osborne	db7c29c6f9	Correction to firehose migration doc (#13423 )	2022-11-28 10:24:27 +05:30
Jill Osborne	68018a808f	Firehose migration doc (#12981 ) * Firehose migration doc * Update migrate-from-firehose-ingestion.md * Updated with review comments and suggestions * Update migrate-from-firehose-ingestion.md * Update migrate-from-firehose-ingestion.md * Update migrate-from-firehose-ingestion.md	2022-11-21 11:17:12 -08:00
Jill Osborne	a860baf496	Updated docs on front coding (#13387 )	2022-11-19 00:01:04 -08:00
Jill Osborne	47dd4ed2e7	Added experimental feature text for front coding feature (#13349 )	2022-11-11 02:06:13 -08:00
Didip Kerabat	56d5c9780d	Use standard library to correctly glob and stop at the correct folder structure when filtering cloud objects (#13027 ) * Use standard library to correctly glob and stop at the correct folder structure when filtering cloud objects. Removed: import org.apache.commons.io.FilenameUtils; Add: import java.nio.file.FileSystems; import java.nio.file.PathMatcher; import java.nio.file.Paths; * Forgot to update CloudObjectInputSource as well. * Fix tests. * Removed unused exceptions. * Able to reduced user mistakes, by removing the protocol and the bucket on filter. * add 1 more test. * add comment on filterWithoutProtocolAndBucket * Fix lint issue. * Fix another lint issue. * Replace all mention of filter -> objectGlob per convo here: https://github.com/apache/druid/pull/13027#issuecomment-1266410707 * fix 1 bad constructor. * Fix the documentation. * Don’t do anything clever with the object path. * Remove unused imports. * Fix spelling error. * Fix incorrect search and replace. * Addressing Gian’s comment. * add filename on .spelling * Fix documentation. * fix documentation again Co-authored-by: Didip Kerabat <didip@apple.com>	2022-11-10 23:46:40 -08:00
Andreas Maechler	03175a2b8d	Add missing MSQ error code fields to docs (#13308 ) * Fix typo * Fix some spacing * Add missing fields * Cleanup table spacing * Remove durable storage docs again Thanks Brian for pointing out previous discussions. * Update docs/multi-stage-query/reference.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Mark codes as code * And even more codes as code * Another set of spaces * Combine `ColumnTypeNotSupported` Thanks Karan. * More whitespaces and typos * Add spelling and fix links Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-11-10 21:03:04 +05:30
Jill Osborne	c2210c4e09	Update ingestion spec doc (#13329 ) * Update ingestion spec doc * Updated * Updated * Update docs/ingestion/ingestion-spec.md Co-authored-by: Clint Wylie <cjwylie@gmail.com> * Updated * Updated Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2022-11-10 02:54:35 -08:00
Gian Merlino	48528a0c98	MSQ: Fix task lock checking during publish, fix lock priority. (#13282 ) * MSQ: Fix task lock checking during publish, fix lock priority. Fixes two issues: 1) ControllerImpl did not properly check the return value of SegmentTransactionalInsertAction when doing a REPLACE. This could cause it to not realize that its locks were preempted. 2) Task lock priority was the default of 0. It should be the higher batch default of 50. The low priority made it possible for MSQ tasks to be preempted by compaction tasks, which is not desired. * Restructuring, add docs. * Add performSegmentPublish tests. * Fix tests.	2022-11-08 09:27:34 +05:30
Jason Koch	0d03ce435f	introduce a "tree" type to the flattenSpec (#12177 ) * introduce a "tree" type to the flattenSpec * feedback - rename exprs to nodes, use CollectionsUtils.isNullOrEmpty for guard * feedback - expand docs to more clearly capture limitations of "tree" flattenSpec * feedback - fix for typo on docs * introduce a comment to explain defensive copy, tweak null handling * fix: part of rebase * mark ObjectFlatteners.FlattenerMaker as an ExtensionPoint and provide default for new tree type * fix: objectflattener restore previous behavior to call getRootField for root type * docs: ingestion/data-formats add note that ORC only supports path expressions * chore: linter remove unused import * fix: use correct newer form for empty DimensionsSpec in FlattenJSONBenchmark	2022-11-01 14:49:30 +08:00
Clint Wylie	77e4246598	add support for 'front coded' string dictionaries for smaller string columns (#12277 ) * add FrontCodedIndexed for delta string encoding * now for actual segments * fix indexOf * fixes and thread safety * add bucket size 4, which seems generally better * fixes * fixes maybe * update indexes to latest interfaces * utf8 support * adjust * oops * oops * refactor, better, faster * more test * fixes * revert * adjustments * fix prefixing * more chill * sql nested benchmark too * refactor * more comments and javadocs * better get * remove base class * fix * hot rod * adjust comments * faster still * minor adjustments * spatial index support * spotbugs * add isSorted to Indexed to strengthen indexOf contract if set, improve javadocs, add docs * fix docs * push into constructor * use base buffer instead of copy * oops	2022-10-25 18:05:38 -07:00
317brian	c83115e4e1	api: change API page formatting (#13213 ) Tracking additional improvements requested by @paul-rogers: #13239 * api: refactor page so that indented bullet is child and unindented portion is parent * get rid of post etc headings and combine them with the endpoint * Update docs/operations/api-reference.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * fix broken links * fix typo Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2022-10-18 13:22:26 -07:00
Jonathan Wei	9b8e69c99a	Add inline descriptor Protobuf bytes decoder (#13192 ) * Add inline descriptor Protobuf bytes decoder * PR comments * Update tests, check for IllegalArgumentException * Fix license, add equals test * Update extensions-core/protobuf-extensions/src/main/java/org/apache/druid/data/input/protobuf/InlineDescriptorProtobufBytesDecoder.java Co-authored-by: Frank Chen <frankchen@apache.org> Co-authored-by: Frank Chen <frankchen@apache.org>	2022-10-11 13:37:28 -05:00
Jonathan Wei	1f1fced6d4	Add JsonInputFormat option to assume newline delimited JSON, improve parse exception handling for multiline JSON (#13089 ) * Add JsonInputFormat option to assume newline delimited JSON, improve handling for non-NDJSON * Fix serde and docs * Add PR comment check	2022-09-26 19:51:04 -05:00
Vadim Ogievetsky	bb0b810b1d	fix html tags in docs (#13117 ) * fix html tags in docs * revert not null	2022-09-18 19:40:33 -07:00
Gian Merlino	d4967c38f8	Various documentation updates. (#13107 ) * Various documentation updates. 1) Split out "data management" from "ingestion". Break it into thematic pages. 2) Move "SQL-based ingestion" into the Ingestion category. Adjust content so all conceptual content is in concepts.md and all syntax content is in reference.md. Shorten the known issues page to the most interesting ones. 3) Add SQL-based ingestion to the ingestion method comparison page. Remove the index task, since index_parallel is just as good when maxNumConcurrentSubTasks: 1. 4) Rename various mentions of "Druid console" to "web console". 5) Add additional information to ingestion/partitioning.md. 6) Remove a mention of Tranquility. 7) Remove a note about upgrading to Druid 0.10.1. 8) Remove no-longer-relevant task types from ingestion/tasks.md. 9) Move ingestion/native-batch-firehose.md to the hidden section. It was previously deprecated. 10) Move ingestion/native-batch-simple-task.md to the hidden section. It is still linked in some places, but it isn't very useful compared to index_parallel, so it shouldn't take up space in the sidebar. 11) Make all br tags self-closing. 12) Certain other cosmetic changes. 13) Update to node-sass 7. * make travis use node12 for docs Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com>	2022-09-16 21:58:11 -07:00
Vadim Ogievetsky	2493eb17bf	Doc fixes around msq (#13090 ) * remove things that do not apply * fix more things * pin node to a working version * fix * fixes * known issues tidy up * revert auto formatting changes * remove management-uis page which is 100% lies * don't mention the Coordinator console (that no longer exits) * goodies * fix typo	2022-09-16 02:15:26 -07:00
Jill Osborne	1f69140623	Nested columns documentation (#12946 ) Co-authored-by: Clint Wylie <cjwylie@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: brian.le <brian.le@imply.io>	2022-09-06 14:42:18 -07:00
Gian Merlino	85d2a6d879	Improve range partitioning docs. (#13016 ) Two improvements: - Use a realistic targetRowsPerSegment, so if people copy and paste the example from the docs, it will generate reasonable segments. - Spell "countryName" correctly.	2022-09-01 15:21:30 -07:00
Jill Osborne	7a1e1f88bb	Remove experimental note from stable features (#12973 ) * Removed experimental note for features that are no longer experimental * Updated native batch doc	2022-08-25 09:26:46 -07:00
Victoria Lim	02914c17b9	Tutorial on ingesting and querying Theta sketches (#12723 ) Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-08-24 09:23:22 -07:00
David Hergenroeder	533c39f35a	Fix rollup docs bullet formatting (#12876 )	2022-08-09 10:10:07 +08:00
Katya Macedo	c6dd9dd4af	Fix typo in compaction.md (#12774 )	2022-08-04 14:47:22 -07:00
Charles Smith	efbb58e90e	docs: remove maxRowsPerSegment where appropriate (#12071 ) * remove maxRowsPerSegment where appropriate * fix tutorial, accept suggestions * Update docs/design/coordinator.md * additional tutorial file * fix initial index spec * accept comments * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * add back comment on maxrows per segment * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * rm duplicate entry * Update native-batch-simple-task.md remove ref to `maxrowspersegment` * Update native-batch.md remove ref to `maxrowspersegment` * final tenticles * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-07-28 16:52:13 +05:30
Victoria Lim	6394ecfd21	update figure and reference (#12813 )	2022-07-22 15:54:25 -07:00
Katya Macedo	809bf161ce	Add a note about setting the value of maxNumConcurrentSubTasks (#12772 ) * Add clarification for combining input source * Update inputFormat note * Update maxNumConcurrentSubTasks note * Fix broken link * Update docs/ingestion/native-batch-input-source.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-07-19 15:34:21 -07:00
Atul Mohan	75045970cd	S3 Ingestion from non-default endpoints (#11798 ) * Add endpoint support for s3inputsource * Changes to tests * Fix docs * Fix config * Fix inspections * Fix spelling * Remove password from toString	2022-07-15 11:03:34 -07:00
Didip Kerabat	6ddb828c7a	Able to filter Cloud objects with glob notation. (#12659 ) In a heterogeneous environment, sometimes you don't have control over the input folder. Upstream can put any folder they want. In this situation the S3InputSource.java is unusable. Most people like me solved it by using Airflow to fetch the full list of parquet files and pass it over to Druid. But doing this explodes the JSON spec. We had a situation where 1 of the JSON spec is 16MB and that's simply too much for Overlord. This patch allows users to pass {"filter": "*.parquet"} and let Druid performs the filtering of the input files. I am using the glob notation to be consistent with the LocalFirehose syntax.	2022-06-24 11:40:08 +05:30
Jill Osborne	f050069767	Segments doc update (#12344 ) * Corrected heading levels in segments doc * IMPLY-18394: Updated Segments doc * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update segments.md * Updated links to changed headings in Segments doc * Corrected spelling error * Update segments.md Incorporated suggestions from Paul Rogers. * Update index.md * Update segments.md * Update segments.md * Update segments.md * Update compaction.md * Update docs/design/segments.md fix typo * Update docs/ingestion/compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-06-16 13:25:17 -07:00
Victoria Lim	353475bd36	Docs for automatic compaction (#12569 ) * docs for auto-compaction * fix broken links * another link * Apply suggestions from code review Co-authored-by: Suneet Saldanha <suneet@apache.org> * Apply suggestions from code review Co-authored-by: Suneet Saldanha <suneet@apache.org> * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Suneet Saldanha <suneet@apache.org> * reorg content for skipOffset * Update docs/ingestion/automatic-compaction.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Apply suggestions from code review Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> Co-authored-by: Suneet Saldanha <suneet@apache.org> Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2022-06-09 14:55:12 -07:00
Gian Merlino	fdfecfd996	Improved docs for range partitioning. (#12350 ) * Improved docs for range partitioning. 1) Clarify the benefits of range partitioning. 2) Clarify which filters support pruning. 3) Include the fact that multi-value dimensions cannot be used for partitioning. * Additional clarification. * Update other section. * Another adjustment. * Updates from review.	2022-05-16 09:42:31 -07:00
Kashif Faraz	60b4fa0f75	Docs: Fix column name in ingestion rollup doc (#12036 ) Fix the referred column name from "count" to "num_rows" as "count" vs. "COUNT(*)" might be a little confusing in this example.	2022-05-10 17:35:59 +05:30
Victoria Lim	0206a2da5c	Update automatic compaction docs with consistent terminology (#12416 ) * specify automatic compaction where applicable * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * update for style and consistency * implement suggested feedback * remove duplicate example * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/ingestion/compaction.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/operations/api-reference.md * update .spelling * Adopt review suggestions Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2022-05-03 16:22:25 -07:00
Charles Smith	42fa5c26e1	remove arbitrary granularity spec from docs (#12460 ) * remove arbitrary granularity spec from docs * Update docs/ingestion/ingestion-spec.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-04-28 16:36:54 -07:00
Peter Marshall	b47316b844	Update native-batch.md (#12478 ) Fixed indent on the Granularity Spec section and removed some superfluous tabbings.	2022-04-25 21:44:17 +08:00
Charles Smith	408b46ae9f	Fixes a small typo in ingestion spec doc (#12143 ) * small typo * Update docs/ingestion/ingestion-spec.md Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: sthetland <steve.hetland@imply.io>	2022-04-18 16:53:50 +08:00

1 2 3 4

173 Commits