Commit Graph

1807 Commits

Author SHA1 Message Date
Maytas Monsereenusorn d7f5293364
Add an option for ingestion task to drop (mark unused) all existing segments that are contained by interval in the ingestionSpec (#11025)
* Auto-Compaction can run indefinitely when segmentGranularity is changed from coarser to finer.

* Add option to drop segments after ingestion

* fix checkstyle

* add tests

* add tests

* add tests

* fix test

* add tests

* fix checkstyle

* fix checkstyle

* add docs

* fix docs

* address comments

* address comments

* fix spelling
2021-04-01 12:29:36 -07:00
Gian Merlino bf20f9e979
DruidInputSource: Fix issues in column projection, timestamp handling. (#10267)
* DruidInputSource: Fix issues in column projection, timestamp handling.

DruidInputSource, DruidSegmentReader changes:

1) Remove "dimensions" and "metrics". They are not necessary, because we
   can compute which columns we need to read based on what is going to
   be used by the timestamp, transform, dimensions, and metrics.
2) Start using ColumnsFilter (see below) to decide which columns we need
   to read.
3) Actually respect the "timestampSpec". Previously, it was ignored, and
   the timestamp of the returned InputRows was set to the `__time` column
   of the input datasource.

(1) and (2) together fix a bug in which the DruidInputSource would not
properly read columns that are used as inputs to a transformSpec.

(3) fixes a bug where the timestampSpec would be ignored if you attempted
to set the column to something other than `__time`.

(1) and (3) are breaking changes.

Web console changes:

1) Remove "Dimensions" and "Metrics" from the Druid input source.
2) Set timestampSpec to `{"column": "__time", "format": "millis"}` for
   compatibility with the new behavior.

Other changes:

1) Add ColumnsFilter, a new class that allows input readers to determine
   which columns they need to read. Currently, it's only used by the
   DruidInputSource, but it could be used by other columnar input sources
   in the future.
2) Add a ColumnsFilter to InputRowSchema.
3) Remove the metric names from InputRowSchema (they were unused).
4) Add InputRowSchemas.fromDataSchema method that computes the proper
   ColumnsFilter for given timestamp, dimensions, transform, and metrics.
5) Add "getRequiredColumns" method to TransformSpec to support the above.

* Various fixups.

* Uncomment incorrectly commented lines.

* Move TransformSpecTest to the proper module.

* Add druid.indexer.task.ignoreTimestampSpecForDruidInputSource setting.

* Fix.

* Fix build.

* Checkstyle.

* Misc fixes.

* Fix test.

* Move config.

* Fix imports.

* Fixup.

* Fix ShuffleResourceTest.

* Add import.

* Smarter exclusions.

* Fixes based on tests.

Also, add TIME_COLUMN constant in the web console.

* Adjustments for tests.

* Reorder test data.

* Update docs.

* Update docs to say Druid 0.22.0 instead of 0.21.0.

* Fix test.

* Fix ITAutoCompactionTest.

* Changes from review & from merging.
2021-03-25 10:32:21 -07:00
Jihoon Son a041933017
Allow overlapping intervals for the compaction task (#10912)
* Allow overlapping intervals for the compaction task

* unused import

* line indentation

Co-authored-by: Maytas Monsereenusorn <maytasm@apache.org>
2021-03-23 11:21:54 -07:00
Maytas Monsereenusorn f19c2e9ce4
If ingested data has sparse columns, the ingested data with forceGuaranteedRollup=true can result in imperfect rollup and final dimension ordering can be different from dimensionSpec ordering in the ingestionSpec (#10948)
* add IT

* add IT

* add the fix

* fix checkstyle

* fix compile

* fix compile

* fix test

* fix test

* address comments
2021-03-18 17:04:28 -07:00
Suneet Saldanha 6b0c2e8996
CompactionTask throws exception on conflicting segmentGranularity (#10996)
* CompactionTask throws exception on conflicting segmentGranularity

* add comment
2021-03-16 12:51:50 -07:00
Maytas Monsereenusorn ed91a2bb38
Fix Kinesis should not increment throwAway count on EOS record (#10976)
* fix Kinesis increament throwAway on EOS record

* fix checkstyle

* fix IT

* fix test

* fix IT

* fix IT

* fix IT

* fix IT
2021-03-11 22:04:58 -08:00
zhangyue19921010 bddacbb1c3
Dynamic auto scale Kafka-Stream ingest tasks (#10524)
* druid task auto scale based on kafka lag

* fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig

* druid task auto scale based on kafka lag

* fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig

* test dynamic auto scale done

* auto scale tasks tested on prd cluster

* auto scale tasks tested on prd cluster

* modify code style to solve 29055.10 29055.9 29055.17 29055.18 29055.19 29055.20

* rename test fiel function

* change codes and add docs based on capistrant reviewed

* midify test docs

* modify docs

* modify docs

* modify docs

* merge from master

* Extract the autoScale logic out of SeekableStreamSupervisor to minimize putting more stuff inside there &&  Make autoscaling algorithm configurable and scalable.

* fix ci failed

* revert msic.xml

* add uts to test autoscaler create && scale out/in and kafka ingest with scale enable

* add more uts

* fix inner class check

* add IT for kafka ingestion with autoscaler

* add new IT in groups=kafka-index named testKafkaIndexDataWithWithAutoscaler

* review change

* code review

* remove unused imports

* fix NLP

* fix docs and UTs

* revert misc.xml

* use jackson to build autoScaleConfig with default values

* add uts

* use jackson to init AutoScalerConfig in IOConfig instead of Map<>

* autoscalerConfig interface and provide a defaultAutoScalerConfig

* modify uts

* modify docs

* fix checkstyle

* revert misc.xml

* modify uts

* reviewed code change

* reviewed code change

* code reviewed

* code review

* log changed

* do StringUtils.encodeForFormat when create allocationExec

* code review && limit taskCountMax to partitionNumbers

* modify docs

* code review

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-03-06 14:36:52 +05:30
Maytas Monsereenusorn b7b0ee8362
Add query granularity to compaction task (#10900)
* add query granularity to compaction task

* fix checkstyle

* fix checkstyle

* fix test

* fix test

* add tests

* fix test

* fix test

* cleanup

* rename class

* fix test

* fix test

* add test

* fix test
2021-03-02 11:23:52 -08:00
Gian Merlino 07902f607b
Granularity: Introduce primitive-typed bucketStart, increment methods. (#10904)
* Granularity: Introduce primitive-typed bucketStart, increment methods.

Saves creation of unnecessary DateTime objects in timestamp_floor and
timestamp_ceil expressions.

* Fix style.

* Amp up the test coverage.
2021-02-25 07:59:20 -08:00
Agustin Gonzalez eabad0fb35
Keep query granularity of compacted segments after compaction (#10856)
* Keep query granularity of compacted segments after compaction

* Protect against null isRollup

* Fix bugspot check RC_REF_COMPARISON_BAD_PRACTICE_BOOLEAN & edit an existing comment

* Make sure that NONE is also included when comparing for the finer granularity

* Update integration test check for segment size due to query granularity propagation affecting size

* Minor code cleanup

* Added functional test to verify queryGranlarity after compaction

* Minor style fix

* Update unit tests
2021-02-18 01:35:10 -08:00
Maytas Monsereenusorn 6541178c21
Support segmentGranularity for auto-compaction (#10843)
* Support segmentGranularity for auto-compaction

* Support segmentGranularity for auto-compaction

* Support segmentGranularity for auto-compaction

* Support segmentGranularity for auto-compaction

* resolve conflict

* Support segmentGranularity for auto-compaction

* Support segmentGranularity for auto-compaction

* fix tests

* fix more tests

* fix checkstyle

* add unit tests

* fix checkstyle

* fix checkstyle

* fix checkstyle

* add unit tests

* add integration tests

* fix checkstyle

* fix checkstyle

* fix failing tests

* address comments

* address comments

* fix tests

* fix tests

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test
2021-02-12 03:03:20 -08:00
Egor Riashin 5d8a69ecf9
fixed input source sampler buildReader exp (#10651)
* fixed input source sampler buildReader exp

* sampler exception unit test

* styling

Co-authored-by: egor-ryashin <egor.ryashin@rilldata.com>
2021-02-05 13:33:55 -08:00
Abhishek Agarwal 96d26e5338
Fix kinesis ingestion bugs (#10761)
* add offsetFetchPeriod to kinesis ingestion doc

* Remove jackson dependencies from extensions

* Use fixed delay for lag collection

* Metrics reset after finishing processing

* comments

* Broaden the list of exceptions to retry for

* Unit tests

* Add more tests

* Refactoring

* re-order metrics

* Doc suggestions

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

* Add tests

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
2021-02-05 02:49:58 -08:00
Jihoon Son 3f8f00a231
Fix CVE-2021-25646 (#10818) 2021-02-04 11:21:43 -08:00
Agustin Gonzalez 0e4750bac2
Granularity interval materialization (#10742)
* Prevent interval materialization for UniformGranularitySpec inside the overlord

* Change API of bucketIntervals in GranularitySpec to return an Iterable<Interval>

* Javadoc update, respect inputIntervals contract

* Eliminate dependency on wrappedspec (i.e. ArbitraryGranularity) in UniformGranularitySpec

* Added one boundary condition test to UniformGranularityTest and fixed Travis forbidden method errors in IntervalsByGranularity

* Fix Travis style & other checks

* Refactor TreeSet to facilitate re-use in UniformGranularitySpec

* Make sure intervals are unique when there is no segment granularity

* Style/bugspot fixes...

* More travis checks

* Add condensedIntervals method to GranularitySpec and pass it as needed to the lock method

* Style & PR feedback

* Fixed failing test

* Fixed bug in IntervalsByGranularity iterator that it would return repeated elements (see added unit tests that were broken before this change)

* Refactor so that we can get the condensed buckets without materializing the intervals

* Get rid of GranularitySpec::condensedInputIntervals ... not needed

* Travis failures fixes

* Travis checkstyle fix

* Edited/added javadoc comments and a method name (code review feedback)

* Fixed jacoco coverage by moving class and adding more coverage

* Avoid materializing the condensed intervals when locking

* Deal with overlapping intervals

* Remove code and use library code instead

* Refactor intervals by granularity using the FluentIterable, add sanity checks

* Change !hasNext() to inputIntervals().isEmpty()

* Remove redundant lambda

* Use materialized intervals here since this is outside the overlord (for performance)

* Name refactor to reflect the fact that bucket intervals are sorted.

* Style fixes

* Removed redundant method and have condensedIntervalIterator throw IAE when element is null for consistency with other methods in this class (as well that null interval when condensing does not make sense)

* Remove forbidden api

* Move helper class inside common base class to reduce public space pollution
2021-01-29 06:02:10 -08:00
Abhishek Agarwal 0080e333cc
Fix cardinality estimation (#10762)
* Fix cardinality estimation

* Add unit test

* code coverage

* fix typo
2021-01-28 15:06:10 -08:00
Maytas Monsereenusorn a46d561bd7
Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead (#10740)
* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* fix checkstyle

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* fix test

* fix test

* add log

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* address comments

* fix checkstyle

* fix checkstyle

* add config to skip overhead memory calculation

* add test for the skipBytesInMemoryOverheadCheck config

* add docs

* fix checkstyle

* fix checkstyle

* fix spelling

* address comments

* fix travis

* address comments
2021-01-27 00:34:56 -08:00
zhangyue19921010 2590ad4f67
Historical unloads damaged segments automatically when lazy on start. (#10688)
* ready to test

* tested on dev cluster

* tested

* code review

* add UTs

* add UTs

* ut passed

* ut passed

* opti imports

* done

* done

* fix checkstyle

* modify uts

* modify logs

* changing the package of SegmentLazyLoadFailCallback.java to org.apache.druid.segment

* merge from master

* modify import orders

* merge from master

* merge from master

* modify logs

* modify docs

* modify logs to rerun ci

* modify logs to rerun ci

* modify logs to rerun ci

* modify logs to rerun ci

* modify logs to rerun ci

* modify logs to rerun ci

* modify logs to rerun ci

* modify logs to rerun ci

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-01-16 19:53:30 -08:00
Jihoon Son 95065bdf1a
Bump dev version to 0.22.0-SNAPSHOT (#10759) 2021-01-15 13:16:23 -08:00
Jihoon Son ca32652932
Fix potential deadlock in batch ingestion (#10736)
* Fix potential deadlock in batch ingestion

* fix checkstyle and comment

* this is better
2021-01-12 12:50:45 -08:00
Xavier Léauté 118b50195e
Introduce KafkaRecordEntity to support Kafka headers in InputFormats (#10730)
Today Kafka message support in streaming indexing tasks is limited to
message values, and does not provide a way to expose Kafka headers,
timestamps, or keys, which may be of interest to more specialized
Druid input formats. For instance, Kafka headers may be used to indicate
payload format/encoding or additional metadata, and timestamps are often
omitted from values in Kafka streams applications, since they are
included in the record.

This change proposes to introduce KafkaRecordEntity as InputEntity,
which would give input formats full access to the underlying Kafka record,
including headers, key, timestamps. It would also open access to low-level
information such as topic, partition, offset if needed.

KafkaEntity is a subclass of ByteEntity for backwards compatibility with
existing input formats, and to avoid introducing unnecessary complexity
for Kinesis indexing tasks.
2021-01-08 16:04:37 -08:00
Liran Funaro 08ab82f55c
IncrementalIndex Tests and Benchmarks Parametrization (#10593)
* Remove redundant IncrementalIndex.Builder

* Parametrize incremental index tests and benchmarks

- Reveal and fix a bug in OffheapIncrementalIndex

* Fix forbiddenapis error: Forbidden method invocation: java.lang.String#format(java.lang.String,java.lang.Object[]) [Uses default locale]

* Fix Intellij errors: declared exception is never thrown

* Add documentation and validate before closing objects on tearDown.

* Add documentation to OffheapIncrementalIndexTestSpec

* Doc corrections and minor changes.

* Add logging for generated rows.

* Refactor new tests/benchmarks.

* Improve IncrementalIndexCreator documentation

* Add required tests for DataGenerator

* Revert "rollupOpportunity" to be a string
2021-01-07 22:18:47 -08:00
Jonathan Wei 68bb038b31
Multiphase segment merge for IndexMergerV9 (#10689)
* Multiphase merge for IndexMergerV9

* JSON fix

* Cleanup temp files

* Docs

* Address logging and add IT

* Fix spelling and test unloader datasource name
2021-01-05 22:19:09 -08:00
Jihoon Son ea2d51d61f
Better error message for compaction task when it sees no segments for compaction (#10728) 2021-01-05 16:49:57 -08:00
Jonathan Wei 769c21cc87
Add sample method to IndexingServiceClient (#10729)
* Add sample method to IndexingServiceClient

* Add unit test

* Fix LGTM
2021-01-05 15:02:44 -08:00
Clint Wylie da0eabaa01
integration test for coordinator and overlord leadership client (#10680)
* integration test for coordinator and overlord leadership, added sys.servers is_leader column

* docs

* remove not needed

* fix comments

* fix compile heh

* oof

* revert unintended

* fix tests, split out docker-compose file selection from starting cluster, use docker-compose down to stop cluster

* fixes

* style

* dang

* heh

* scripts are hard

* fix spelling

* fix thing that must not matter since was already wrong ip, log when test fails

* needs more heap

* fix merge

* less aggro
2020-12-17 22:50:12 -08:00
Gian Merlino 96a387d972
Fixes and tests related to the Indexer process. (#10631)
* Fixes and tests related to the Indexer process.

Three bugs fixed:

1) Indexers would not announce themselves as segment servers if they
   did not have storage locations defined. This used to work, but was
   broken in #9971. Fixed this by adding an "isSegmentServer" method
   to ServerType and updating SegmentLoadDropHandler to always announce
   if this method returns true.

2) Certain batch task types were written in a way that assumed "isReady"
   would be called before "run", which is not guaranteed. In particular,
   they relied on it in order to initialize "taskLockHelper". Fixed this
   by updating AbstractBatchIndexTask to ensure "isReady" is called
   before "run" for these tasks.

3) UnifiedIndexerAppenderatorsManager did not properly handle complex
   datasources. Introduced DataSourceAnalysis in order to fix this.

Test changes:

1) Add a new "docker-compose.cli-indexer.yml" config that spins up an
   Indexer instead of a MiddleManager.

2) Introduce a "USE_INDEXER" environment variable that determines if
   docker-compose will start up an Indexer or a MiddleManager.

3) Duplicate all the jdk8 tests and run them in both MiddleManager and
   Indexer mode.

4) Various adjustments to encourage fail-fast errors in the Docker
   build scripts.

5) Various adjustments to speed up integration tests and reduce memory
   usage.

6) Add another Mac-specific approach to determining a machine's own IP.
   This was useful on my development machine.

7) Update segment-count check in ITCompactionTaskTest to eliminate a
   race condition (it was looking for 6 segments, which only exist
   together briefly, until the older 4 are marked unused).

Javadoc updates:

1) AbstractBatchIndexTask: Added javadocs to determineLockGranularityXXX
   that make it clear when taskLockHelper will be initialized as a side
   effect. (Related to the second bug above.)

2) Task: Clarified that "isReady" is not guaranteed to be called before
   "run". It was already implied, but now it's explicit.

3) ZkCoordinator: Clarified deprecation message.

4) DataSegmentServerAnnouncer: Clarified deprecation message.

* Fix stop_cluster script.

* Fix sanity check in script.

* Fix hashbang lines.

* Test and doc adjustments.

* Additional tests, and adjustments for tests.

* Split ITs back out.

* Revert change to druid_coordinator_period_indexingPeriod.

* Set Indexer capacity to match MM.

* Bump up Historical memory.

* Bump down coordinator, overlord memory.

* Bump up Broker memory.
2020-12-08 16:02:26 -08:00
Gian Merlino 9acab0b646
DruidInputSource: Sort segments by ID before grouping into splits. (#10646)
This is useful because it groups up segments for the same time chunk
into the same splits, which in turn is useful because it minimizes the
number of time chunks that each task will have to deal with.
2020-12-07 13:48:24 -08:00
egor-ryashin f46cc4faaf Revert "fixed input source sampler buildReader exp"
This reverts commit e688db8
2020-12-07 18:34:59 +03:00
egor-ryashin e688db8503 fixed input source sampler buildReader exp 2020-12-07 18:28:25 +03:00
Gian Merlino b7641f644c
Two fixes related to encoding of % symbols. (#10645)
* Two fixes related to encoding of % symbols.

1) TaskResourceFilter: Don't double-decode task ids. request.getPathSegments()
   returns already-decoded strings. Applying StringUtils.urlDecode on
   top of that causes erroneous behavior with '%' characters.

2) Update various ThreadFactoryBuilder name formats to escape '%'
   characters. This fixes situations where substrings starting with '%'
   are erroneously treated as format specifiers.

ITs are updated to include a '%' in extra.datasource.name.suffix.

* Avoid String.replace.

* Work around surefire bug.

* Fix xml encoding.

* Another try at the proper encoding.

* Give up on the emojis.

* Less ambitious testing.

* Fix an additional problem.

* Adjust encodeForFormat to return null if the input is null.
2020-12-06 22:35:11 -08:00
Gian Merlino 17f39ab91e
Fix misspellings in druid-forbidden-apis. (#10634)
These caused certain APIs to not actually be properly forbidden.

Also removed two MoreExecutors entries for methods that don't exist in
our version of Guava.
2020-12-05 15:26:57 -08:00
Jihoon Son 7462b0b953
Allow missing intervals for Parallel task with hash/range partitioning (#10592)
* Allow missing intervals for Parallel task

* fix row filter

* fix tests

* fix log
2020-11-25 14:50:22 -08:00
frank chen fe693a4f01
Improve doc and exception message for invalid user configurations (#10598)
* improve doc and exception message

* add spelling check rules and remove unused import

* add a test to improve test coverage
2020-11-23 15:03:13 -08:00
Lucas Capistrant 3447934a75
Ensure Krb auth before killing YARN apps in graceful shutdown (#9785) 2020-11-16 09:59:14 -08:00
frank chen e83d5cb59e
Fix ingestion failure of pretty-formatted JSON message (#10383)
* support multi-line text

* add test cases

* split json text into lines case by case

* improve exception handle

* fix CI

* use IntermediateRowParsingReader as base of JsonReader

* update doc

* ignore the non-immutable field in test case

* add more test cases

* mark `lineSplittable` as final

* fix testcases

* fix doc

* add a test case for SqlReader

* return all raw columns when exception occurs

* fix CI

* fix test cases

* resolve review comments

* handle ParseException returned by index.add

* apply Iterables.getOnlyElement

* fix CI

* fix test cases

* improve code in more graceful way

* fix test cases

* fix test cases

* add a test case to check multiple json string in one text block

* fix inspection check
2020-11-13 13:59:23 -08:00
Himanshu ee136303bb
optionally disable all of hardcoded zookeeper use (#9507)
* optionally disable all of hardcoded zookeeper use

* fix DruidCoordinatorTest compilation

* fix test in DruidCoordinatorTest

* fix strict compilation

Co-authored-by: Himanshu Gupta <fill email>
2020-10-26 22:35:59 -07:00
Liran Funaro f3a2903218
Configurable Index Type (#10335)
* Introduce a Configurable Index Type

* Change to @UnstableApi

* Add AppendableIndexSpecTest

* Update doc

* Add spelling exception

* Add tests coverage

* Revert some of the changes to reduce diff

* Minor fixes

* Update getMaxBytesInMemoryOrDefault() comment

* Fix typo, remove redundant interface

* Remove off-heap spec (postponed to a later PR)

* Add javadocs to AppendableIndexSpec

* Describe testCreateTask()

* Add tests for AppendableIndexSpec within TuningConfig

* Modify hashCode() to conform with equals()

* Add comment where building incremental-index

* Add "EqualsVerifier" tests

* Revert some of the API back to AppenderatorConfig

* Don't use multi-line comments

* Remove knob documentation (deferred)
2020-10-23 18:34:26 -07:00
Jihoon Son ad437dd655
Add shuffle metrics for parallel indexing (#10359)
* Add shuffle metrics for parallel indexing

* javadoc and concurrency test

* concurrency

* fix javadoc

* Feature flag

* doc

* fix doc and add a test

* checkstyle

* add tests

* fix build and address comments
2020-10-10 19:35:17 -07:00
Jihoon Son e9e7d82714
Fix compaction task slot computation in auto compaction (#10479)
* Fix compaction task slot computation in auto compaction

* add tests for task counting
2020-10-06 21:56:03 -07:00
Jonathan Wei 65c0d64676
Update version to 0.21.0-SNAPSHOT (#10450)
* [maven-release-plugin] prepare release druid-0.21.0

* [maven-release-plugin] prepare for next development iteration

* Update web-console versions
2020-10-03 16:08:34 -07:00
Abhishek Agarwal e282ab5695
Fix the task id creation in CompactionTask (#10445)
* Fix the task id creation in CompactionTask

* review comments

* Ignore test for range partitioning and segment lock
2020-10-01 15:02:04 -07:00
Mainak Ghosh 8168e14e92
Adding task slot count metrics to Druid Overlord (#10379)
* Adding more worker metrics to Druid Overlord

* Changing the nomenclature from worker to peon as that represents the metrics that we want to monitor better

* Few more instance of worker usage replaced with peon

* Modifying the peon idle count logic to only use eligible workers available capacity

* Changing the naming to task slot count instead of peon

* Adding some unit test coverage for the new test runner apis

* Addressing Review Comments

* Modifying the TaskSlotCountStatsProvider apis so that overlords which are not leader do not emit these metrics

* Fixing the spelling issue in the docs

* Setting the annotation Nullable on the TaskSlotCountStatsProvider methods
2020-09-28 23:50:38 -07:00
Jihoon Son 0cc9eb4903
Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided (#10288)
* Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided

* query context

* fix tests; add more test

* javadoc

* docs and more tests

* remove default and hadoop tests

* consistent name and fix javadoc

* spelling and field name

* default function for partitionsSpec

* other comments

* address comments

* fix tests and spelling

* test

* doc
2020-09-24 16:32:56 -07:00
Jonathan Wei cb30b1fe23
Automatically determine numShards for parallel ingestion hash partitioning (#10419)
* Automatically determine numShards for parallel ingestion hash partitioning

* Fix inspection, tests, coverage

* Docs and some PR comments

* Adjust locking

* Use HllSketch instead of HyperLogLogCollector

* Fix tests

* Address some PR comments

* Fix granularity bug

* Small doc fix
2020-09-24 13:47:53 -07:00
Maytas Monsereenusorn 72f1b55f56
Add last_compaction_state to sys.segments table (#10413)
* Add is_compacted to sys.segments table

* change is_compacted to last_compaction_state

* fix tests

* fix tests

* address comments
2020-09-23 15:29:36 -07:00
Jihoon Son 8f14ac814e
More structured way to handle parse exceptions (#10336)
* More structured way to handle parse exceptions

* checkstyle; add more tests

* forbidden api; test

* address comment; new test

* address review comments

* javadoc for parseException; remove redundant parseException in streaming ingestion

* fix tests

* unnecessary catch

* unused imports

* appenderator test

* unused import
2020-09-11 16:31:10 -07:00
Gian Merlino 8ab1979304
Remove implied profanity from error messages. (#10270)
i.e. WTF, WTH.
2020-08-28 11:38:50 -07:00
Jihoon Son f82fd22fa7
Move tools for indexing to TaskToolbox instead of injecting them in constructor (#10308)
* Move tools for indexing to TaskToolbox instead of injecting them in constructor

* oops, other changes

* fix test

* unnecessary new file

* fix test

* fix build
2020-08-26 17:08:12 -07:00
Gian Merlino 21703d81ac
Fix handling of 'join' on top of 'union' datasources. (#10318)
* Fix handling of 'join' on top of 'union' datasources.

The problem is that unions are typically rewritten into a series of
individual queries on the underlying tables, but this isn't done when
the union is wrapped in a join.

The main changes are in UnionQueryRunner:

1) Replace an instanceof UnionQueryRunner check with DataSourceAnalysis.
2) Replace a "query.withDataSource" call with a new function, "Queries.withBaseDataSource".

Together, these enable UnionQueryRunner to "see through" a join.

* Tests.

* Adjust heap sizes for integration tests.

* Different approach, more tests.

* Tweak.

* Styling.
2020-08-26 14:23:54 -07:00