Commit Graph

11417 Commits

Author SHA1 Message Date
Gian Merlino 76d281d64f
Enable allocating segments at ALL granularity. (#12003)
* Enable allocating segments at ALL granularity.

The main change is that Granularity.granularitiesFinerThan will return ALL if ALL
is passed in.

Allocating segments at ALL granularity is somewhat unconventional, but there
is nothing wrong with it, and it actually makes a lot of sense for tables that
are meant to be used for lookups or dimensions rather than main fact tables.
This change enables ALL segmentGranularity to work properly in appendToExisting
mode.

Also clarifies behavior in javadocs and tests.

* Move tests to improve coverage.
2021-12-03 14:15:05 -08:00
Gian Merlino bc2cc47db6
SeekableStreamSupervisor: Coalesce adjacent RunNotices. (#12018)
The idea is that if multiple notices come in around the same time due
to rapid task status changes, we only need to execute one of them.
2021-12-03 13:42:03 -08:00
Jihoon Son fc9513b6cd
Make NodeRole available during binding; add support for dynamic registration of DruidService (#12012)
* Make nodeRole available during binding; add support for dynamic registration of DruidService

* fix checkstyle and test

* fix customRole test

* address comments

* add more javadoc
2021-12-03 11:59:00 -08:00
Gian Merlino e0e05aad99
Enhancements to IndexTaskClient. (#12011)
* Enhancements to IndexTaskClient.

1) Ability to use handlers other than StringFullResponseHandler. This
   functionality is not used in production code yet, but is useful
   because it will allow tasks to communicate with each other in
   non-string-based formats and in streaming fashion. In the future,
   we'll be able to use this to make task-to-task communication
   more efficient.

2) Truncate server errors at 1KB, so long errors do not pollute logs.

3) Change error log level for retryable errors from WARN to INFO. (The
   final error is still WARN.)

4) Harmonize log and exception messages to have a more consistent format.

* Additional tests and improvements.
2021-12-03 09:14:32 -08:00
jacobtolar f7f5505631
Add avro_ocf to supported Kafka/Kinesis InputFormats (#11865)
* Update docs - Kinesis InputFormat ingestion

* Add avro_ocf to list of supported Kafka InputFormats

* Remove extra whitespace.

* Update kafka-supervisor-reference.md

* Delete extra whitespace.
2021-12-03 07:57:26 -08:00
Frank Chen c2cea25a6b
Improve exception message when loading data from web-console (#11723)
* Improve exception handling

* Revert some changes

* Resolve comments

* Update indexing-service/src/main/java/org/apache/druid/indexing/overlord/sampler/SamplerExceptionMapper.java

Co-authored-by: Karan Kumar <karankumar1100@gmail.com>

* Update indexing-service/src/main/java/org/apache/druid/indexing/overlord/sampler/SamplerExceptionMapper.java

Co-authored-by: Karan Kumar <karankumar1100@gmail.com>

* Address review comments

Co-authored-by: Karan Kumar <karankumar1100@gmail.com>
2021-12-03 21:33:49 +08:00
Frank Chen 4631a66723
Support rolling log files (#10147)
* apply log file rolling strategy

* fix doc

Signed-off-by: frank chen <frank.chen021@outlook.com>

* Use absolute log path and allow spaces in log path

* Update log4j2 configuration

* apply FileAppender to ZooKeeper

* DO NOT redirect application's console log to file in supervisor
2021-12-03 21:32:01 +08:00
Charles Smith 7ed46800c3
Docs: Add multi-dimension partitioning doc; refactor native batch and separate into smaller topics. (#11983)
Adds documentation for multi-dimension partitioning. cc: @kfaraz
Refactors the native batch partitioning topic as follows:

Native batch ingestion covers parallel-index
Native batch simple task indexing covers index
Native batch input sources covers ioSource
Native batch ingestion with firehose covers deprecated firehose
2021-12-03 16:37:14 +05:30
Abhishek Agarwal 503384569a
Fix classNotFoundException when connecting to secure LDAP (#11978)
This PR fixes a problem where the com.sun.jndi.ldap.Connection tries to build BasicSecuritySSLSocketFactory when calling LDAPCredentialsValidator.validateCredentials since BasicSecuritySSLSocketFactory is in extension class loader and not visible to system classloader.
2021-12-03 12:08:19 +05:30
Clint Wylie af6541a236
allow `DruidSchema` to fallback to segment metadata 'type' if 'typeSignature' is null (#12016)
* allow `DruidSchema` to fallback to segment metadata type if typeSignature is null, to avoid producing incorrect SQL schema if broker is upgraded to 0.23 before historicals

* mmm, forbidden tests
2021-12-02 17:42:01 -08:00
Clint Wylie 84b4bf56d8
vectorize logical operators and boolean functions (#11184)
changes:
* adds new config, druid.expressions.useStrictBooleans which make longs the official boolean type of all expressions
* vectorize logical operators and boolean functions, some only if useStrictBooleans is true
2021-12-02 16:40:23 -08:00
Gian Merlino f47afd7b98
HttpResponseHandler: Fill out truncated javadoc. (#12004) 2021-12-02 14:05:51 -08:00
Vadim Ogievetsky 1f95a42bb8
Web console: updated the explain dialog to use new explain output (#12009)
This is the UI followup to the work done in #11908

Updated the Explain dialog to use the new output format.
2021-12-02 00:18:11 +05:30
Karan Kumar ffa553593f
Use one factory in json reader (#11999) 2021-12-01 16:17:48 +05:30
benkrug 11746b8536
Update datasketches-hll.md (#12010)
under "Aggregators", about the lgK setting, it said "Must be a power of 2 from 4 to 21 inclusively."  21 is not a power of 2, nor is 12, the given default.  I think there may have been confusion because lgK represents log2 of K.  We could say "K must be a power of 2...", or just say lgK must be between 4 and 21.
2021-11-30 18:52:00 -08:00
Paul Rogers a66f10eea1
Code cleanup from query profile project (#11822)
* Code cleanup from query profile project

* Fix spelling errors
* Fix Javadoc formatting
* Abstract out repeated test code
* Reuse constants in place of some string literals
* Fix up some parameterized types
* Reduce warnings reported by Eclipse

* Reverted change due to lack of tests
2021-11-30 11:35:38 -08:00
Gian Merlino f6e6ca2893
Use intermediate-persist IndexSpec during multiphase merge. (#11940)
* Use intermediate-persist IndexSpec during multiphase merge.

The main change is the addition of an intermediate-persist IndexSpec
to the main "merge" method in IndexMerger. There are also a few minor
adjustments to the IndexMerger interface to encourage more harmonious
usage of its methods in the future.

* Additional changes inspired by the test coverage checker.

- Remove unused-in-production IndexMerger methods "append" and "convert".
- Add additional unit tests to UnifiedIndexerAppenderatorsManager.

* Additional adjustments.

* Even more additional adjustments.

* Test fixes.
2021-11-29 15:08:49 -08:00
Charles Smith f536f31229
clarify avro support & general style improvements (#11975)
* clarify avro support & general style improvements

* Update docs/development/extensions-core/avro.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/avro.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/avro.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/avro.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/avro.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/avro.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/avro.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/avro.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/avro.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update avro.md

remove redundancy

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
2021-11-28 16:10:18 +08:00
Gian Merlino 93aeaf4801
Improve on-heap aggregator footprint estimates. (#11950)
Add a "guessAggregatorHeapFootprint" method to AggregatorFactory that
mitigates #6743 by enabling heap footprint estimates based on a specific
number of rows. The idea is that at ingestion time, the number of rows
that go into an aggregator will be 1 (if rollup is off) or will likely
be a small number (if rollup is on).

It's a heuristic, because of course nothing guarantees that the rollup
ratio is a small number. But it's a common case, and I expect this logic
to go wrong much less often than the current logic. Also, when it does
go wrong, users can fix it by lowering maxRowsInMemory or
maxBytesInMemory. The current situation is unintuitive: when the
estimation goes wrong, users get an OOME, but actually they need to
*raise* these limits to fix it.
2021-11-28 13:21:24 +05:30
Agustin Gonzalez 8eff6334f7
AWS "Data read has a different length than the expected" error should reset stream and try again (#11941)
* Add support for custom reset condition & support for other args to have defaults to make the method api consistent

* Add support for custom reset condition to InputEntity

* Fix test names

* Clarifying comments to why we need to read the message's content to identify S3's resettable exception

* Add unit test to verify custom resettable condition for S3Entity

* Provide a way to customize retries since they are expensive to test
2021-11-26 12:45:34 -07:00
Sandeep 9bc18a93a2
warn when segment cannot be loaded by Historical nodes (#11849) 2021-11-26 17:27:17 +08:00
Kashif Faraz b48f5a576b
Fix: Do not require time condition on InlineDataSource (#11982)
For queries on logical values, e.g. SELECT 1337, we need not check for
a filter on __time column even if requireTimeCondition is true.
2021-11-25 21:10:06 +05:30
Laksh Singla c381cae51b
Improve the output of SQL explain message (#11908)
Currently, when we try to do EXPLAIN PLAN FOR, it returns the structure of the SQL parsed (via Calcite's internal planner util), which is verbose (since it tries to explain about the nodes in the SQL, instead of the Druid Query), and not representative of the native Druid query which will get executed on the broker side.

This PR aims to change the format when user tries to EXPLAIN PLAN FOR for queries which are executed by converting them into Druid's native queries (i.e. not sys schemas).
2021-11-25 21:08:33 +05:30
Frank Chen 98957be044
Return HTTP 404 instead of 400 for supervisor/task endpoints (#11724)
* Use 404 instead of 400

* Use 404 instead of 400

* Add UT test cases

* Add IT testcases

* add UT for task resource filter

Signed-off-by: frank chen <frank.chen021@outlook.com>

* Using org.testing.Assert instead of org.junit.Assert

* Resolve comments and fix test

* Fix test

* Fix tests

* Resolve comments
2021-11-25 13:09:47 +08:00
Rohan Garg 2c08055962
Specify time column for first/last aggregators (#11949)
Add the ability to pass time column in first/last aggregator (and latest/earliest SQL functions). It is to support cases where the time to query upon is stored as a part of a column different than __time. Also, some other logical time column can be specified.
2021-11-25 09:44:14 +05:30
Gian Merlino 3d72e66f56
Consolidate a bunch of ad-hoc segments metadata SQL; fix some bugs. (#11582)
* Consolidate a bunch of ad-hoc segments metadata SQL; fix some bugs.

This patch gathers together a variety of SQL from SqlSegmentsMetadataManager
and IndexerSQLMetadataStorageCoordinator into a new class SqlSegmentsMetadataQuery.
It focuses on SQL related to retrieving segment payloads and marking
segments used and unused.

In addition to cleaning up the code a bit, this patch also fixes a bug
with years before 0 or after 9999. The prior SQL did not work properly
because dates outside this range cannot be compared as strings. The new
code does work for these far-past and far-future years.

So, if you're ever interested in using Druid to analyze things from
ancient Babylon, you better apply this patch first!

* Fix test compiling.

* Fixes and improvements.

* Fix forbidden API.

* Additional fixes.
2021-11-24 14:51:53 -08:00
Gian Merlino 12e2228510
RowBasedGrouperHelper: Set hasMultipleValues = false in capabilities. (#11954)
Useful because it enables anything that consumes groupBy results to
potentially operate more efficiently.
2021-11-24 13:14:58 -08:00
Gian Merlino 5e168b861a
StorageAdapter: Add getRowSignature method. (#11953)
Simplifies logic for callers that only want to get a list of all the
column names, or column names and types. Updated callers SegmentAnalyzer,
HashJoinSegmentStorageAdapter, and DruidSegmentReader.
2021-11-24 13:14:25 -08:00
Gian Merlino 0354407655
SQL INSERT planner support. (#11959)
* SQL INSERT planner support.

The main changes are:

1) DruidPlanner is able to validate and authorize INSERT queries. They
   require WRITE permission on the target datasource.

2) QueryMaker is now an interface, and there is a QueryMakerFactory that
   creates instances of it. There is only one production implementation
   of each (NativeQueryMaker and NativeQueryMakerFactory), which
   together behave the same way as the former QueryMaker class. But this
   opens the door to executing queries in ways other than the Druid
   query stack, and is used by unit tests (CalciteInsertDmlTest) to
   test the INSERT planning functionality.

3) Adds an EXTERN table macro that allows references external data using
   InputSource and InputFormat from Druid's batch ingestion API. This is
   not exposed in production yet, but is used by unit tests.

4) Adds a QueryFeature concept that enables the planner to change its
   behavior slightly depending on the capabilities of the execution
   system.

5) Adds an "AuthorizableOperator" concept that enables SqlOperators
   to require additional permissions. This is used by the EXTERN table
   macro.

Related odds and ends:

- Add equals, hashCode, toString methods to InlineInputSource. Aids in
  the "from external" tests in CalciteInsertDmlTest.
- Add JSON-serializability to RowSignature.
- Move the SQL string inside PlannerContext so it is "baked into" the
  planner when the planner is created. Cleans up the code a bit, since
  in practice, the same query is passed in every time to the
  same planner anyway.

* Fix up calls to CalciteTests.createMockQueryLifecycleFactory.

* Fix checkstyle issues.

* Adjustments for CI.

* Adjust DruidAvaticaHandlerTest for stricter test authorizations.
2021-11-24 12:14:04 -08:00
Maytas Monsereenusorn bb3d2a433a
Support filtering data in Auto Compaction (#11922)
* add impl

* fix checkstyle

* add test

* add test

* add unit tests

* fix unit tests

* fix unit tests

* fix unit tests

* add IT

* add IT

* add comments

* fix spelling
2021-11-24 10:56:38 -08:00
Kashif Faraz 48dbe0ea45
Handle null values in Range Partition dimension distribution (#11973)
This PR adds support for handling null dimension values while creating partition boundaries
in range partitioning.

This means that we can now have partition boundaries like [null, "abc"] or ["abc", null, "def"].
2021-11-24 14:30:02 +05:30
cheddar e6570cadc4
Update LifecycleModule.java (#11972)
Update the javadoc on LifecycleModule to be more clear about why the register methods exist and why they should always be used instead of Guice's eager instantiation.
2021-11-23 17:03:37 -08:00
Abhishek Agarwal b6a0fbc8b6
Break down CalciteQueryTest (#11979)
* Refactor calciteQueryTest

* Move more tests to CalciteJoinQueryTest
2021-11-24 00:15:42 +05:30
Agustin Gonzalez 311d9a2370
Log correct hydrant count (#11976) 2021-11-23 08:22:17 -08:00
Kashif Faraz 6607e4cc75
Docs: Remove reference to deprecated field `targetPartitionSize` (#11974)
* Remove reference to deprecated field `targetPartitionSize`

* Fix spelling of LeaderLatch
2021-11-23 15:32:16 +05:30
Clint Wylie e22abb68b0
Update .spelling (#11977) 2021-11-22 22:28:51 -08:00
Laksh Singla b5a25f24f2
Improve the DruidRexExecutor w.r.t handling of numeric arrays (#11968)
DruidRexExecutor while reducing Arrays, specially numeric arrays, doesn't convert the value from ExprResult's type to BigDecimal, which causes makeLiteral to cast the values. Also, if NaN or Infinite values are present in the array, the error is a generic NumberFormatException. For example:

SELECT ARRAY[1.11, 2.22] returns [1, 2]
SELECT SQRT(-1) throws a generic NumberFormatException instead of IAE

This PR introduces change to cast the numeric values to BigDecimal since Calcite's library understands that easily, and doesn't perform casts.
2021-11-23 11:40:59 +05:30
Peter Marshall ed0606db69
Docs - Corrected admonition issue (#11926)
* Corrected admonition issue

* Update data-formats.md

Removed all admonition bits, and took out sf linebreaks.

* Update data-formats.md

Changed the shocker line into something a little more practical.
2021-11-22 12:14:30 -08:00
Katya Macedo 706d057ccc
corrected leaderlatch name (#11966) 2021-11-22 11:58:42 -08:00
Gian Merlino 35b610ada7
QueryableIndexColumnSelectorFactory: Double-check cached column class. (#11957)
Important because an earlier call to getCachedColumn may have been
done with a different class, leading to a ClassCastException on the
second call. In the prior code, this could happen if a complex column
had makeDimensionSelector called on it after makeColumnValueSelector had
already been called.
2021-11-22 11:31:24 -08:00
Gian Merlino d6507c9428
PrioritizedExecutorService: Properly wrap on direct calls to "execute". (#11956)
Usually, "execute" is called by methods defined in the superclass
AbstractExecutorService, and the passed-in Runnable has been wrapped
by newTaskFor inside a PrioritizedListenableFutureTask. But this method
can also be called directly, and if so, the same wrapping is necessary
for the delegate to get a Runnable that can be entered into a priority
queue with the others.
2021-11-22 10:30:12 -08:00
TSFenwick a4cb1de87a
get rid of class cast exception and add a new testcase for that issue (#11951) 2021-11-22 08:44:20 -08:00
jacobtolar 0a9a908031
Add inline native query example to tutorial (#11642)
* Add inline native query example to tutorial

Minor change to the tutorial that adds an example of a native HTTP query request body, and adds a link to the more detailed "native query over HTTP" documentation.

* cleanup

* Apply suggestions from code review.

Co-authored-by: sthetland <steve.hetland@imply.io>

Co-authored-by: sthetland <steve.hetland@imply.io>
2021-11-22 21:35:05 +08:00
Sandeep b1de56a3be
update Druid Chart README doc and removes unnecessary lock file (#11945)
* update Druid Chart README doc and removes unnecessary lock file

* update Druid Chart README doc and removes unnecessary lock file
2021-11-22 21:34:26 +08:00
XIAO WANG f1cf1c8f39
update count distinct tests (#11927)
Co-authored-by: wangxiao060 <wangxiao060@ke.com>
2021-11-22 21:34:00 +08:00
Peter Marshall 0c0001579d
Update compaction.md (#11937)
Removed superfluous tabs that caused issues in rendering
Added nav to the `inputSpec`
2021-11-22 21:33:47 +08:00
jacobtolar 3aee5d9ec3
Fix: invalid JSON in ingestion spec doc example (#11880)
* Fix: invalid JSON in ingestion spec doc example

* Update ingestion-spec.md
2021-11-22 21:33:26 +08:00
Frank Chen e77938b205
Add thread count to pre-push hook to speed up checking (#11808)
* Add thread count to accelerate checking

* add comment
2021-11-22 21:33:01 +08:00
Frank Chen cfd60f1222
Improve README for integration test (#11860)
* Optimize IT readme

* Resolve comments
2021-11-22 21:32:36 +08:00
Gian Merlino b13f07a057
Harmonize local input sources; fix batch index integration test. (#11965)
* Make LocalInputSource.files a List instead of Set and adjust wikipedia_index_task to use file list.

Rationale: the behavior of wikipedia_index_task.json is order-dependent with regard to its input
files; some orders produce 4 segments and some produce 5 segments. Some integration tests, like
ITSystemTableBatchIndexTaskTest and ITAutoCompactionTest, are written assuming that the
4-segment case will always happen. Providing the file list in a specific order ensures that this
will happen as expected by the tests.

I didn't see a specific reason why the LocalInputSource.files parameter needed to be a Set, so
changing it to a List was the simplest way to achieve the consistent ordering. I think it will
also make the behavior make more sense if someone does specify the same input file multiple
times in a spec: I think they'd expect it to be loaded multiple times instead of deduped. This
is consistent with the behavior of other input sources like S3, GCS, HTTP.

* Sort files in LocalFirehoseFactory.
2021-11-21 22:26:31 -08:00