2697 Commits

Author SHA1 Message Date
Vadim Ogievetsky
07597c687d
Docs: Remove large data file (#13595) 2022-12-19 13:14:22 +05:30
Gian Merlino
ee890965f4
LocalInputSource: Serialize File paths without forcing resolution. (#13534)
* LocalInputSource: Serialize File paths without forcing resolution.

Fixes #13359.

* Add one more javadoc.
2022-12-19 11:47:36 +05:30
Victoria Lim
09d8b16447
Document shouldFinalize for sketches that have the parameter (#13524)
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2022-12-17 10:48:06 -08:00
317brian
d9c27d6102
docs: add index page and related stuff for jupyter tutorials (#13342) 2022-12-16 13:33:50 -08:00
Gian Merlino
7f3c117e3a
SQL: Improve docs around casts. (#13466)
Main change: clarify that the "default value" for casts only applies if
druid.generic.useDefaultValueForNull = true.

Secondary change: adjust a bunch of wording from future to present tense.
2022-12-15 15:01:40 -08:00
Kashif Faraz
d6949b1b79
Track input processedBytes with MSQ ingestion (#13559)
Follow up to #13520

Bytes processed are currently tracked for intermediate stages in MSQ ingestion.
This patch adds the capability to track the bytes processed by an MSQ controller
task while reading from an external input source or a segment source.

Changes:
- Track `processedBytes` for every `InputSource` read in `ExternalInputSliceReader`
- Update `ChannelCounters` with the above obtained `processedBytes` when incrementing
the input file count.
- Update task report structure in docs

The total input processed bytes can be obtained by summing the `processedBytes` as follows:

totalBytes = 0
for every root stage (i.e. a stage which does not have another stage as an input):
    for every worker in that stage:
        for every input channel: (i.e. channels with prefix "input", e.g. "input0", "input1", etc.)
            totalBytes += processedBytes
2022-12-16 02:20:01 +05:30
Adarsh Sanjeev
2b605aa9cf
Multiple fixes for the MSQ stats merging piece which (#13463)
* Add validation checks to worker chat handler apis

* Merge things and polishing the error messages.

* Minor error message change

* Fixing race and adding some tests

* Fixing controller fetching stats from wrong workers.
Fixing race
Changing default mode to Parallel
Adding logging.
Fixing exceptions not propagated properly.

* Changing to kernel worker count

* Added a better logic to figure out assigned worker for a stage.

* Nits

* Moving to existing kernel methods

* Adding more coverage

Co-authored-by: cryptoe <karankumar1100@gmail.com>
2022-12-15 09:35:11 +05:30
Vadim Ogievetsky
2729e25295
Link to java docs (#13478)
* add link to page about selecting a JRE

* add link to script also

* simplify text
2022-12-14 11:45:23 -08:00
Gian Merlino
de5a4bafcb
Zero-copy local deep storage. (#13394)
* Zero-copy local deep storage.

This is useful for local deep storage, since it reduces disk usage and
makes Historicals able to load segments instantaneously.

Two changes:

1) Introduce "druid.storage.zip" parameter for local storage, which defaults
   to false. This changes default behavior from writing an index.zip to writing
   a regular directory. This is safe to do even during a rolling update, because
   the older code actually already handled unzipped directories being present
   on local deep storage.

2) In LocalDataSegmentPuller and LocalDataSegmentPusher, use hard links
   instead of copies when possible. (Generally this is possible when the
   source and destination directory are on the same filesystem.)
2022-12-12 17:28:24 -08:00
Rishabh Singh
4ebdfe226d
Druid automated quickstart (#13365)
* Druid automated quickstart

* remove conf/druid/single-server/quickstart/_common/historical/jvm.config

* Minor changes in python script

* Add lower bound memory for some services

* Additional runtime properties for services

* Update supervise script to accept command arguments, corresponding changes in druid-quickstart.py

* File end newline

* Limit the ability to start multiple instances of a service, documentation changes

* simplify script arguments

* restore changes in medium profile

* run-druid refactor

* compute and pass middle manager runtime properties to run-druid
supervise script changes to process java opts array
use argparse, leave free memory, logging

* Remove extra quotes from mm task javaopts array

* Update logic to compute minimum memory

* simplify run-druid

* remove debug options from run-druid

* resolve the config_path provided

* comment out service specific runtime properties which are computed in the code

* simplify run-druid

* clean up docs, naming changes

* Throw ValueError exception on illegal state

* update docs

* rename args, compute_only -> compute, run_zk -> zk

* update help documentation

* update help documentation

* move task memory computation into separate method

* Add validation checks

* remove print

* Add validations

* remove start-druid bash script, rename start-druid-main

* Include tasks in lower bound memory calculation

* Fix test

* 256m instead of 256g

* caffeine cache uses 5% of heap

* ensure min task count is 2, task count is monotonic

* update configs and documentation for runtime props in conf/druid/single-server/quickstart

* Update docs

* Specify memory argument for each profile in single-server.md

* Update middleManager runtime.properties

* Move quickstart configs to conf/druid/base, add bash launch script, support python2

* Update supervise script

* rename base config directory to auto

* rename python script, changes to pass repeated args to supervise

* remove exmaples/conf/druid/base dir

* add docs

* restore changes in conf dir

* update start-druid-auto

* remove hashref for commands in supervise script

* start-druid-main java_opts array is comma separated

* update entry point script name in python script

* Update help docs

* documentation changes

* docs changes

* update docs

* add support for running indexer

* update supported services list

* update help

* Update python.md

* remove dir

* update .spelling

* Remove dependency on psutil and pathlib

* update docs

* Update get_physical_memory method

* Update help docs

* update docs

* update method to get physical memory on python

* udpate spelling

* update .spelling

* minor change

* Minor change

* memory comptuation for indexer

* update start-druid

* Update python.md

* Update single-server.md

* Update python.md

* run python3 --version to check if python is installed

* Update supervise script

* start-druid: echo message if python not found

* update anchor text

* minor change

* Update condition in supervise script

* JVM not jvm in docs
2022-12-09 11:04:02 -08:00
Paul Rogers
013a12e86f
Enhanced MSQ table functions (#13360)
* Enhanced MSQ table functions
* HTTP, LOCALFILES and INLINE table functions powered by
catalog metadata.
* Documentation
2022-12-08 13:56:02 -08:00
Gian Merlino
91ef9872ec
MSQ: Improve TooManyBuckets error message, improve error docs. (#13525)
1) Edited the TooManyBuckets error message to mention PARTITIONED BY
   instead of segmentGranularity.

2) Added error-code-specific anchors in the docs.

3) Add information to various error codes in the docs about common
   causes and solutions.
2022-12-08 13:18:26 -08:00
Jill Osborne
b56855b837
Update to native ingestion doc (#13482)
* Update to native ingestion doc

* Update docs/ingestion/native-batch.md

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

* Update native-batch.md

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2022-12-07 15:08:19 +05:30
Vadim Ogievetsky
9679f6a9b5
Web console: add arrayOfDoublesSketch and other small fixes (#13486)
* add padding and keywords

* add arrayOfDoubles

* Update docs/development/extensions-core/datasketches-tuple.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/development/extensions-core/datasketches-tuple.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/development/extensions-core/datasketches-tuple.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/development/extensions-core/datasketches-tuple.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/development/extensions-core/datasketches-tuple.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* partiton int

* fix docs

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2022-12-06 21:21:49 -08:00
Kashif Faraz
c7229fc787
Limit max batch size for segment allocation, add docs (#13503)
Changes:
- Limit max batch size in `SegmentAllocationQueue` to 500
- Rename `batchAllocationMaxWaitTime` to `batchAllocationWaitTime` since the actual
wait time may exceed this configured value.
- Replace usage of `SegmentInsertAction` in `TaskToolbox` with `SegmentTransactionalInsertAction`
2022-12-07 10:07:14 +05:30
Gian Merlino
fda0a1aadd
Set chatAsync default to true. (#13491)
This functionality was originally added in #13354.
2022-12-05 20:53:59 -08:00
Kashif Faraz
65945a686f
Docs: Update docs for coordinator dynamic config (#13494)
* Update docs for useBatchedSegmentSampler

* Update docs for round robin assigment
2022-12-05 16:53:10 +05:30
TSFenwick
10bec54acc
Switching emitter. This will allow for a per feed emitter designation. (#13363)
* Switching emitter. This will allow for a per feed emitter designation.

This will work by looking at an event's feed and direct it to a specific emitter. If no specific feed is specified for a feed.
The emitter can direct the event to a default emitter.

* fix checkstyle issues and make docs for switching emitter use basic event feeds

* fix broken docs, add test, and guard against misconfigurations

* add module test
add switching emitter module test

* fix broken SwitchingEmitterModuleTest

* add apache license to top of test

* fix checkstyle issues

* address comments by adding javadocs, removing a todo, and making druid docs more clear
2022-12-05 16:04:34 +05:30
Katya Macedo
78c1a2bd66
Remove limit from timeseries (#13457)
CI build failures seem unrelated to docs
2022-12-02 12:19:59 -08:00
Jill Osborne
138a6de507
Update nested columns docs (#13461)
* Update nested columns docs

(cherry picked from commit 04206c5179e0eb46a30d4113c7332daee46c390d)

* Update nested-columns.md

(cherry picked from commit 8085ee7217d90e0e3f133985a52ec2e0b0552992)
2022-12-01 10:47:32 -08:00
317brian
cc2e4a80ff
doc: add a basic JDBC tutorial (#13343)
* initial commit for jdbc tutorial

(cherry picked from commit 04c4adad71e5436b76c3425fe369df03aaaf0acb)

* add commentary

* address comments from charles

* add query context to example

* fix typo

* add links

* Apply suggestions from code review

Co-authored-by: Frank Chen <frankchen@apache.org>

* fix datatype

* address feedback

* add parameterize to spelling file. the past tense version was already there

Co-authored-by: Frank Chen <frankchen@apache.org>
2022-11-30 16:25:35 -08:00
Jill Osborne
291ded22d5
Update experimental features doc (#13452) 2022-11-30 16:14:43 +05:30
Jill Osborne
5c520e0cf9
Update LDAP configuration docs (#13245)
* Update LDAP configuration docs

* Updated after review

* Update auth-ldap.md

Updated.

* Update auth-ldap.md

* Updated spelling file

* Update docs/operations/auth-ldap.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/operations/auth-ldap.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/operations/auth-ldap.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update auth-ldap.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2022-11-29 09:26:32 -08:00
Jill Osborne
100a2aa4a2
Update and document experimental features (#13348)
* Update and document experimental features
* Updated
* Update experimental-features.md
* Update docs/development/experimental-features.md
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
* Updated after review
* Updated
* Update materialized-view.md
* Update experimental-features.md
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2022-11-29 08:01:28 +05:30
Jill Osborne
db7c29c6f9
Correction to firehose migration doc (#13423) 2022-11-28 10:24:27 +05:30
Adarsh Sanjeev
280a0f7158
Add sequential sketch merging to MSQ (#13205)
* Add sketch fetching framework

* Refactor code to support sequential merge

* Update worker sketch fetcher

* Refactor sketch fetcher

* Refactor sketch fetcher

* Add context parameter and threshold to trigger sequential merge

* Fix test

* Add integration test for non sequential merge

* Address review comments

* Address review comments

* Address review comments

* Resolve maxRetainedBytes

* Add new classes

* Renamed key statistics information class

* Rename fetchStatisticsSnapshotForTimeChunk function

* Address review comments

* Address review comments

* Update documentation and add comments

* Resolve build issues

* Resolve build issues

* Change worker APIs to async

* Address review comments

* Resolve build issues

* Add null time check

* Update integration tests

* Address review comments

* Add log messages and comments

* Resolve build issues

* Add unit tests

* Add unit tests

* Fix timing issue in tests
2022-11-22 09:56:32 +05:30
Jill Osborne
68018a808f
Firehose migration doc (#12981)
* Firehose migration doc

* Update migrate-from-firehose-ingestion.md

* Updated with review comments and suggestions

* Update migrate-from-firehose-ingestion.md

* Update migrate-from-firehose-ingestion.md

* Update migrate-from-firehose-ingestion.md
2022-11-21 11:17:12 -08:00
Gian Merlino
bfffbabb56
Async task client for SeekableStreamSupervisors. (#13354)
Main changes:
1) Convert SeekableStreamIndexTaskClient to an interface, move old code
   to SeekableStreamIndexTaskClientSyncImpl, and add new implementation
   SeekableStreamIndexTaskClientAsyncImpl that uses ServiceClient.
2) Add "chatAsync" parameter to seekable stream supervisors that causes
   the supervisor to use an async task client.
3) In SeekableStreamSupervisor.discoverTasks, adjust logic to avoid making
   blocking RPC calls in workerExec threads.
4) In SeekableStreamSupervisor generally, switch from Futures.successfulAsList
   to FutureUtils.coalesce, so we can better capture the errors that occurred
   with contacting individual tasks.

Other, related changes:
1) Add ServiceRetryPolicy.retryNotAvailable, which controls whether
   ServiceClient retries unavailable services. Useful since we do not
   want to retry calls unavailable tasks within the service client. (The
   supervisor does its own higher-level retries.)
2) Add FutureUtils.transformAsync, a more lambda friendly version of
   Futures.transform(f, AsyncFunction).
3) Add FutureUtils.coalesce. Similar to Futures.successfulAsList, but
   returns Either instead of using null on error.
4) Add JacksonUtils.readValue overloads for JavaType and TypeReference.
2022-11-21 19:20:26 +05:30
Katya Macedo
fd239305d9
Update metrics doc (#13316)
Changes:
- used inline code-style to format dimension names
- removed unnecessary punctuation
2022-11-21 09:43:52 +05:30
Jill Osborne
a860baf496
Updated docs on front coding (#13387) 2022-11-19 00:01:04 -08:00
Laksh Singla
9e938b5a6f
Add a limit to the number of columns in the CLUSTERED BY clause (#13352)
* Add clustered by limit

* change semantics, add docs

* add fault class to the module

* add test

* unambiguate test
2022-11-15 22:05:15 +05:30
Clint Wylie
1231ce3b75
dump-segment tool support for examining nested columns (#13356)
* add nested mode to dump segment tool to dump nested columns

* docs

* more test

* fix it
2022-11-14 16:08:47 -08:00
Jill Osborne
b0db2a87d8
Update Kafka ingestion tutorial (#13261)
* Update Kafka ingestion tutorial

* Update tutorial-kafka.md

Updated location of sample data file

* Added sample data file

* Update tutorial-kafka.md

* Add sample data file

* Update tutorial-kafka.md

Updated sample file location in curl commands

* Update and reuploading sample data files

* Updated spelling file

* Delete .spelling

* Added spelling file

* Update docs/tutorials/tutorial-kafka.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-kafka.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Updated after review

* Update tutorial-kafka.md

* Updated

* Update tutorial-kafka.md

* Update tutorial-kafka.md

* Update tutorial-kafka.md

* Updated sample data file and command

* Add files via upload

* Delete kttm-nested-data.json.tgz

* Delete kttm-nested-data.json.tgz

* Add files via upload

* Update tutorial-kafka.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2022-11-11 14:47:54 -08:00
Jill Osborne
47dd4ed2e7
Added experimental feature text for front coding feature (#13349) 2022-11-11 02:06:13 -08:00
Didip Kerabat
56d5c9780d
Use standard library to correctly glob and stop at the correct folder structure when filtering cloud objects (#13027)
* Use standard library to correctly glob and stop at the correct folder structure when filtering cloud objects.

Removed:

import org.apache.commons.io.FilenameUtils;

Add:

import java.nio.file.FileSystems;
import java.nio.file.PathMatcher;
import java.nio.file.Paths;

* Forgot to update CloudObjectInputSource as well.

* Fix tests.

* Removed unused exceptions.

* Able to reduced user mistakes, by removing the protocol and the bucket on filter.

* add 1 more test.

* add comment on filterWithoutProtocolAndBucket

* Fix lint issue.

* Fix another lint issue.

* Replace all mention of filter -> objectGlob per convo here:

https://github.com/apache/druid/pull/13027#issuecomment-1266410707

* fix 1 bad constructor.

* Fix the documentation.

* Don’t do anything clever with the object path.

* Remove unused imports.

* Fix spelling error.

* Fix incorrect search and replace.

* Addressing Gian’s comment.

* add filename on .spelling

* Fix documentation.

* fix documentation again

Co-authored-by: Didip Kerabat <didip@apple.com>
2022-11-10 23:46:40 -08:00
Gian Merlino
77478f25fb
Add taskActionType dimension to task/action/run/time. (#13333)
* Add taskActionType dimension to task/action/run/time.

* Spelling.
2022-11-11 12:00:08 +05:30
Andreas Maechler
03175a2b8d
Add missing MSQ error code fields to docs (#13308)
* Fix typo

* Fix some spacing

* Add missing fields

* Cleanup table spacing

* Remove durable storage docs again

Thanks Brian for pointing out previous discussions.

* Update docs/multi-stage-query/reference.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Mark codes as code

* And even more codes as code

* Another set of spaces

* Combine `ColumnTypeNotSupported`

Thanks Karan.

* More whitespaces and typos

* Add spelling and fix links

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2022-11-10 21:03:04 +05:30
Jill Osborne
c2210c4e09
Update ingestion spec doc (#13329)
* Update ingestion spec doc

* Updated

* Updated

* Update docs/ingestion/ingestion-spec.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* Updated

* Updated

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2022-11-10 02:54:35 -08:00
Jill Osborne
965e41538e
Update nested columns doc (#13314)
* Updated nested columns doc

* Update nested-columns.md

* Update nested-columns.md
2022-11-10 09:53:28 +08:00
AmatyaAvadhanula
a2013e6566
Enhance streaming ingestion metrics (#13331)
Changes:
- Add a metric for partition-wise kafka/kinesis lag for streaming ingestion.
- Emit lag metrics for streaming ingestion when supervisor is not suspended and state is in {RUNNING, IDLE, UNHEALTHY_TASKS, UNHEALTHY_SUPERVISOR}
- Document metrics
2022-11-09 23:44:15 +05:30
Laksh Singla
b7a513fe09
Add a OverlordHelper that cleans up durable storage objects in MSQ (#13269)
* scratch

* s3 ls fix, add docs

* add documentation, update method name

* Add tests, address commits, change default value of the helper

* fix test

* update the default value of config, remove initial delay config

* Trigger Build

* update class

* add more tests

* docs update

* spellcheck

* remove ioe from the signature

* add back dmmy constructor for initialization

* fix guice bindings, intellij inspections
2022-11-09 17:23:35 +05:30
Kashif Faraz
ff8e0c3397
Fix issues with caching cost strategy (#13321)
`cachingCost` strategy has some discrepancies when compared to cost strategy.
This commit addresses two of these by retaining the same behaviour as the `cost` strategy
when computing the cost of moving a segment to a server:
- subtract the self cost of a segment if it is being served by the target server
- subtract the cost of segments that are marked to be dropped

Other changes:
- Add tests to verify fixed strategy. These tests would fail without the fixes made to `CachingCostStrategy.computeCost()`
- Fix the definition of the segment related metrics in the docs.
- Fix some docs issues introduced in #13181
2022-11-08 16:11:39 +05:30
Tejaswini Bandlamudi
594545da55
Adds cluster level idleConfig setting for supervisor (#13311)
* adds cluster level idleConfig

* updates docs

* refactoring

* spelling nit

* nit

* nit

* refactoring
2022-11-08 14:54:14 +05:30
Gian Merlino
48528a0c98
MSQ: Fix task lock checking during publish, fix lock priority. (#13282)
* MSQ: Fix task lock checking during publish, fix lock priority.

Fixes two issues:

1) ControllerImpl did not properly check the return value of
   SegmentTransactionalInsertAction when doing a REPLACE. This could cause
   it to not realize that its locks were preempted.

2) Task lock priority was the default of 0. It should be the higher
   batch default of 50. The low priority made it possible for MSQ tasks
   to be preempted by compaction tasks, which is not desired.

* Restructuring, add docs.

* Add performSegmentPublish tests.

* Fix tests.
2022-11-08 09:27:34 +05:30
Jill Osborne
d1a4de022a
Update retention rules doc (#13181)
* Update retention rules doc

* Update rule-configuration.md

* Updated

* Updated

* Updated

* Updated

* Update rule-configuration.md

* Update rule-configuration.md
2022-11-07 14:47:33 -08:00
AmatyaAvadhanula
a738ac9ad7
Improve task pause logging and metrics for streaming ingestion (#13313)
* Improve task pause logging and metrics for streaming ingestion

* Add metrics doc

* Fix spelling
2022-11-07 21:33:54 +05:30
AmatyaAvadhanula
47c32a9d92
Skip ALL granularity compaction (#13304)
* Skip autocompaction for datasources with ETERNITY segments
2022-11-07 17:55:03 +05:30
Gian Merlino
227b57dd8e
Compaction: Fetch segments one at a time on main task; skip when possible. (#13280)
* Compaction: Fetch segments one at a time on main task; skip when possible.

Compact tasks include the ability to fetch existing segments and determine
reasonable defaults for granularitySpec, dimensionsSpec, and metricsSpec.
This is a useful feature that makes compact tasks work well even when the
user running the compaction does not have a clear idea of what they want
the compacted segments to be like.

However, this comes at a cost: it takes time, and disk space, to do all
of these fetches. This patch improves the situation in two ways:

1) When segments do need to be fetched, download them one at a time and
   delete them when we're done. This still takes time, but minimizes the
   required disk space.

2) Don't fetch segments on the main compact task when they aren't needed.
   If the user provides a full granularitySpec, dimensionsSpec, and
   metricsSpec, we can skip it.

* Adjustments.

* Changes from code review.

* Fix logic for determining rollup.
2022-11-07 14:50:14 +05:30
Gian Merlino
9423aa9163
MSQ: Consider PARTITION_STATS_MAX_BYTES in WorkerMemoryParameters. (#13274)
* MSQ: Consider PARTITION_STATS_MAX_BYTES in WorkerMemoryParameters.

This consideration is important, because otherwise we can run out of
memory due to large statistics-tracking objects.

* Improved calculations.
2022-11-07 14:27:18 +05:30
Gian Merlino
8f90589ce5
Always return sketches from DS_HLL, DS_THETA, DS_QUANTILES_SKETCH. (#13247)
* Always return sketches from DS_HLL, DS_THETA, DS_QUANTILES_SKETCH.

These aggregation functions are documented as creating sketches. However,
they are planned into native aggregators that include finalization logic
to convert the sketch to a number of some sort. This creates an
inconsistency: the functions sometimes return sketches, and sometimes
return numbers, depending on where they lie in the native query plan.

This patch changes these SQL aggregators to _never_ finalize, by using
the "shouldFinalize" feature of the native aggregators. It already
existed for theta sketches. This patch adds the feature for hll and
quantiles sketches.

As to impact, Druid finalizes aggregators in two cases:

- When they appear in the outer level of a query (not a subquery).
- When they are used as input to an expression or finalizing-field-access
  post-aggregator (not any other kind of post-aggregator).

With this patch, the functions will no longer be finalized in these cases.

The second item is not likely to matter much. The SQL functions all declare
return type OTHER, which would be usable as an input to any other function
that makes sense and that would be planned into an expression.

So, the main effect of this patch is the first item. To provide backwards
compatibility with anyone that was depending on the old behavior, the
patch adds a "sqlFinalizeOuterSketches" query context parameter that
restores the old behavior.

Other changes:

1) Move various argument-checking logic from runtime to planning time in
   DoublesSketchListArgBaseOperatorConversion, by adding an OperandTypeChecker.

2) Add various JsonIgnores to the sketches to simplify their JSON representations.

3) Allow chaining of ExpressionPostAggregators and other PostAggregators
   in the SQL layer.

4) Avoid unnecessary FieldAccessPostAggregator wrapping in the SQL layer,
   now that expressions can operate on complex inputs.

5) Adjust return type to thetaSketch (instead of OTHER) in
   ThetaSketchSetBaseOperatorConversion.

* Fix benchmark class.

* Fix compilation error.

* Fix ThetaSketchSqlAggregatorTest.

* Hopefully fix ITAutoCompactionTest.

* Adjustment to ITAutoCompactionTest.
2022-11-03 09:43:00 -07:00