druid

Commit Graph

Author	SHA1	Message	Date
Charles Smith	6524d838d7	Docs refactor of ingestion. Carries #11541 (#11576 ) * Docs refactor of ingestion. Carries #11541 * Update docs/misc/math-expr.md * add Apache license * fix header, add topics to sidebar * Update docs/ingestion/partitioning.md * pick up changes to and md from `c7fdf1d`, #11479 Co-authored-by: Suneet Saldanha <suneet@apache.org> Co-authored-by: Jihoon Son <jihoonson@apache.org>	2021-08-13 08:42:03 -07:00
Peter Marshall	60fdf7a734	Rollup measurement query amended (#11479 ) By user request from https://groups.google.com/g/druid-user/c/bFkOtE-1eQg - gives the measure as a floating point instead of an integer.	2021-07-27 06:29:29 -07:00
jerryleooo	c7fdf1d685	Fix typo in ingestion spec sample (#11433 ) * Update index.md Fix typo in the ingestion spec sample * fixed more typos	2021-07-19 22:02:21 -07:00
Lasse Krogh Mammen	9be2a5cdc2	Add documentation re alphabetical sorted of MV dimensions (#10695 )	2021-05-07 01:12:32 -07:00
sthetland	ca1412d574	Reduce visibility of Tranquility documentation (#11134 ) * reduce visibility of tranquility doc Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>	2021-05-03 16:48:24 -07:00
Gian Merlino	a47c0d2579	Clarify meaning of "root-level fields" in the documentation. (#11143 )	2021-04-24 11:06:08 +08:00
Charles Smith	d69533dbd9	First refactor of compaction (#10935 ) * first pass compaction refactor. includes updated behavior for queryGranularity. removes duplicated doc * fix links, typos, some reorganization * fix spelling. TBD still there for work in progress * updates tutorial examples, adds more clarification around compaction use cases * add granularity spec to automatic compaction config * final edits * spelling fixes * apply suggestions from review * upadtes from review * last edits * move note * clarify null * fix links & spelling * latest review * edits to auto-compaction config * add back rollup * fix links & spelling * Update compaction.md add granularityspec to example	2021-03-24 11:41:44 -07:00
Maytas Monsereenusorn	a46d561bd7	Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead (#10740 ) * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * fix checkstyle * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * fix test * fix test * add log * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * address comments * fix checkstyle * fix checkstyle * add config to skip overhead memory calculation * add test for the skipBytesInMemoryOverheadCheck config * add docs * fix checkstyle * fix checkstyle * fix spelling * address comments * fix travis * address comments	2021-01-27 00:34:56 -08:00
sthetland	6ae8059c09	cleaning up and fixing links (#10528 ) * cleaning up and fixing links * reverting local link * Update indexer.md * link checking * Fixing one more stale link for PostgreSQL	2020-12-17 13:37:43 -08:00
Jihoon Son	0cc9eb4903	Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided (#10288 ) * Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided * query context * fix tests; add more test * javadoc * docs and more tests * remove default and hadoop tests * consistent name and fix javadoc * spelling and field name * default function for partitionsSpec * other comments * address comments * fix tests and spelling * test * doc	2020-09-24 16:32:56 -07:00
Gian Merlino	d36a0f61da	Clarify documentation on dimensions, dimensionExclusions. (#10265 ) In particular: exclusions are ignored if dimensions are set.	2020-08-12 08:06:53 -07:00
mans2singh	d4bd6e5207	ingestion and tutorial doc update (#10202 )	2020-07-21 17:52:23 -07:00
Jian Wang	85dfbb64cb	Update documention for metricCompression (#9811 )	2020-05-03 12:56:48 -07:00
Clint Wylie	bf85ea19b2	roaring bitmaps by default (#9548 ) * it is finally time * fix it * more docs * fix doc	2020-03-23 18:15:57 -07:00
Jihoon Son	153495068b	Doc update for the new input source and the new input format (#9171 ) * Doc update for new input source and input format. - The input source and input format are promoted in all docs under docs/ingestion - All input sources including core extension ones are located in docs/ingestion/native-batch.md - All input formats and parsers including core extension ones are localted in docs/ingestion/data-formats.md - New behavior of the parallel task with different partitionsSpecs are documented in docs/ingestion/native-batch.md * parquet * add warning for range partitioning with sequential mode * hdfs + s3, gs * add fs impl for gs * address comments * address comments * gcs	2020-01-17 15:52:05 -08:00
Chi Cao Minh	bab78fc80e	Parallel indexing single dim partitions (#8925 ) * Parallel indexing single dim partitions Implements single dimension range partitioning for native parallel batch indexing as described in #8769. This initial version requires the druid-datasketches extension to be loaded. The algorithm has 5 phases that are orchestrated by the supervisor in `ParallelIndexSupervisorTask#runRangePartitionMultiPhaseParallel()`. These phases and the main classes involved are described below: 1) In parallel, determine the distribution of dimension values for each input source split. `PartialDimensionDistributionTask` uses `StringSketch` to generate the approximate distribution of dimension values for each input source split. If the rows are ungrouped, `PartialDimensionDistributionTask.UngroupedRowDimensionValueFilter` uses a Bloom filter to skip rows that would be grouped. The final distribution is sent back to the supervisor via `DimensionDistributionReport`. 2) The range partitions are determined. In `ParallelIndexSupervisorTask#determineAllRangePartitions()`, the supervisor uses `StringSketchMerger` to merge the individual `StringSketch`es created in the preceding phase. The merged sketch is then used to create the range partitions. 3) In parallel, generate partial range-partitioned segments. `PartialRangeSegmentGenerateTask` uses the range partitions determined in the preceding phase and `RangePartitionCachingLocalSegmentAllocator` to generate `SingleDimensionShardSpec`s. The partition information is sent back to the supervisor via `GeneratedGenericPartitionsReport`. 4) The partial range segments are grouped. In `ParallelIndexSupervisorTask#groupGenericPartitionLocationsPerPartition()`, the supervisor creates the `PartialGenericSegmentMergeIOConfig`s necessary for the next phase. 5) In parallel, merge partial range-partitioned segments. `PartialGenericSegmentMergeTask` uses `GenericPartitionLocation` to retrieve the partial range-partitioned segments generated earlier and then merges and publishes them. * Fix dependencies & forbidden apis * Fixes for integration test * Address review comments * Fix docs, strict compile, sketch check, rollup check * Fix first shard spec, partition serde, single subtask * Fix first partition check in test * Misc rewording/refactoring to address code review * Fix doc link * Split batch index integration test * Do not run parallel-batch-index twice * Adjust last partition * Split ITParallelIndexTest to reduce runtime * Rename test class * Allow null values in range partitions * Indicate which phase failed * Improve asserts in tests	2019-12-09 23:05:49 -08:00
Vadim Ogievetsky	f9b94a5db1	Docs: remove self link (#8760 ) This section links to itself in the description. I tried to follow that link and spit hot tea all over my monitor from laughter.	2019-10-27 22:33:22 -07:00
Lucas Capistrant	d801ce2f29	Update rollup table to properly reflect 0.16.0 (#8638 ) This table stated that `index_parallel` tasks were best-effort only. However, this changed with #8061 and this documentation update was simply missed.	2019-10-07 12:37:15 -07:00
Chi Cao Minh	7dcbaca658	Spellcheck docs (#8548 ) * Spellcheck docs Fix spelling mistakes in docs and add CI job for running spellcheck on docs. * Add missing license header	2019-09-17 12:47:30 -07:00
Gian Merlino	d007477742	Docusaurus build framework + ingestion doc refresh. (#8311 ) * Docusaurus build framework + ingestion doc refresh. * stick to npm instead of yarn * fix typos * restore some _bin * Adjustments. * detect and fix redirect anchors * update anchor lint * Web-console: remove specific column filters (#8343) * add clear filter * update tool kit * remove usless check * auto run * add % * Fix resource leak (#8337) * Fix resource leak * Patch comments * Enable Spotbugs NP_NONNULL_RETURN_VIOLATION (#8234) * Fixes from PR review. * Fix more anchors. * Preamble nix. * Fix more anchors, headers * clean up placeholder page * add to website lint to travis config * better broken link checking * travis fix * Fixed more broken links * better redirects * unfancy catch * fix LGTM error * link fixes * fix md issues * Addl fixes	2019-08-20 21:48:59 -07:00

20 Commits