* Code cleanup from query profile project
* Fix spelling errors
* Fix Javadoc formatting
* Abstract out repeated test code
* Reuse constants in place of some string literals
* Fix up some parameterized types
* Reduce warnings reported by Eclipse
* Reverted change due to lack of tests
* Use intermediate-persist IndexSpec during multiphase merge.
The main change is the addition of an intermediate-persist IndexSpec
to the main "merge" method in IndexMerger. There are also a few minor
adjustments to the IndexMerger interface to encourage more harmonious
usage of its methods in the future.
* Additional changes inspired by the test coverage checker.
- Remove unused-in-production IndexMerger methods "append" and "convert".
- Add additional unit tests to UnifiedIndexerAppenderatorsManager.
* Additional adjustments.
* Even more additional adjustments.
* Test fixes.
Add a "guessAggregatorHeapFootprint" method to AggregatorFactory that
mitigates #6743 by enabling heap footprint estimates based on a specific
number of rows. The idea is that at ingestion time, the number of rows
that go into an aggregator will be 1 (if rollup is off) or will likely
be a small number (if rollup is on).
It's a heuristic, because of course nothing guarantees that the rollup
ratio is a small number. But it's a common case, and I expect this logic
to go wrong much less often than the current logic. Also, when it does
go wrong, users can fix it by lowering maxRowsInMemory or
maxBytesInMemory. The current situation is unintuitive: when the
estimation goes wrong, users get an OOME, but actually they need to
*raise* these limits to fix it.
* Add support for custom reset condition & support for other args to have defaults to make the method api consistent
* Add support for custom reset condition to InputEntity
* Fix test names
* Clarifying comments to why we need to read the message's content to identify S3's resettable exception
* Add unit test to verify custom resettable condition for S3Entity
* Provide a way to customize retries since they are expensive to test
Currently, when we try to do EXPLAIN PLAN FOR, it returns the structure of the SQL parsed (via Calcite's internal planner util), which is verbose (since it tries to explain about the nodes in the SQL, instead of the Druid Query), and not representative of the native Druid query which will get executed on the broker side.
This PR aims to change the format when user tries to EXPLAIN PLAN FOR for queries which are executed by converting them into Druid's native queries (i.e. not sys schemas).
* Use 404 instead of 400
* Use 404 instead of 400
* Add UT test cases
* Add IT testcases
* add UT for task resource filter
Signed-off-by: frank chen <frank.chen021@outlook.com>
* Using org.testing.Assert instead of org.junit.Assert
* Resolve comments and fix test
* Fix test
* Fix tests
* Resolve comments
Add the ability to pass time column in first/last aggregator (and latest/earliest SQL functions). It is to support cases where the time to query upon is stored as a part of a column different than __time. Also, some other logical time column can be specified.
* Consolidate a bunch of ad-hoc segments metadata SQL; fix some bugs.
This patch gathers together a variety of SQL from SqlSegmentsMetadataManager
and IndexerSQLMetadataStorageCoordinator into a new class SqlSegmentsMetadataQuery.
It focuses on SQL related to retrieving segment payloads and marking
segments used and unused.
In addition to cleaning up the code a bit, this patch also fixes a bug
with years before 0 or after 9999. The prior SQL did not work properly
because dates outside this range cannot be compared as strings. The new
code does work for these far-past and far-future years.
So, if you're ever interested in using Druid to analyze things from
ancient Babylon, you better apply this patch first!
* Fix test compiling.
* Fixes and improvements.
* Fix forbidden API.
* Additional fixes.
Simplifies logic for callers that only want to get a list of all the
column names, or column names and types. Updated callers SegmentAnalyzer,
HashJoinSegmentStorageAdapter, and DruidSegmentReader.
* SQL INSERT planner support.
The main changes are:
1) DruidPlanner is able to validate and authorize INSERT queries. They
require WRITE permission on the target datasource.
2) QueryMaker is now an interface, and there is a QueryMakerFactory that
creates instances of it. There is only one production implementation
of each (NativeQueryMaker and NativeQueryMakerFactory), which
together behave the same way as the former QueryMaker class. But this
opens the door to executing queries in ways other than the Druid
query stack, and is used by unit tests (CalciteInsertDmlTest) to
test the INSERT planning functionality.
3) Adds an EXTERN table macro that allows references external data using
InputSource and InputFormat from Druid's batch ingestion API. This is
not exposed in production yet, but is used by unit tests.
4) Adds a QueryFeature concept that enables the planner to change its
behavior slightly depending on the capabilities of the execution
system.
5) Adds an "AuthorizableOperator" concept that enables SqlOperators
to require additional permissions. This is used by the EXTERN table
macro.
Related odds and ends:
- Add equals, hashCode, toString methods to InlineInputSource. Aids in
the "from external" tests in CalciteInsertDmlTest.
- Add JSON-serializability to RowSignature.
- Move the SQL string inside PlannerContext so it is "baked into" the
planner when the planner is created. Cleans up the code a bit, since
in practice, the same query is passed in every time to the
same planner anyway.
* Fix up calls to CalciteTests.createMockQueryLifecycleFactory.
* Fix checkstyle issues.
* Adjustments for CI.
* Adjust DruidAvaticaHandlerTest for stricter test authorizations.
* add impl
* fix checkstyle
* add test
* add test
* add unit tests
* fix unit tests
* fix unit tests
* fix unit tests
* add IT
* add IT
* add comments
* fix spelling
This PR adds support for handling null dimension values while creating partition boundaries
in range partitioning.
This means that we can now have partition boundaries like [null, "abc"] or ["abc", null, "def"].
Update the javadoc on LifecycleModule to be more clear about why the register methods exist and why they should always be used instead of Guice's eager instantiation.
DruidRexExecutor while reducing Arrays, specially numeric arrays, doesn't convert the value from ExprResult's type to BigDecimal, which causes makeLiteral to cast the values. Also, if NaN or Infinite values are present in the array, the error is a generic NumberFormatException. For example:
SELECT ARRAY[1.11, 2.22] returns [1, 2]
SELECT SQRT(-1) throws a generic NumberFormatException instead of IAE
This PR introduces change to cast the numeric values to BigDecimal since Calcite's library understands that easily, and doesn't perform casts.
* Corrected admonition issue
* Update data-formats.md
Removed all admonition bits, and took out sf linebreaks.
* Update data-formats.md
Changed the shocker line into something a little more practical.
Important because an earlier call to getCachedColumn may have been
done with a different class, leading to a ClassCastException on the
second call. In the prior code, this could happen if a complex column
had makeDimensionSelector called on it after makeColumnValueSelector had
already been called.
Usually, "execute" is called by methods defined in the superclass
AbstractExecutorService, and the passed-in Runnable has been wrapped
by newTaskFor inside a PrioritizedListenableFutureTask. But this method
can also be called directly, and if so, the same wrapping is necessary
for the delegate to get a Runnable that can be entered into a priority
queue with the others.
* Add inline native query example to tutorial
Minor change to the tutorial that adds an example of a native HTTP query request body, and adds a link to the more detailed "native query over HTTP" documentation.
* cleanup
* Apply suggestions from code review.
Co-authored-by: sthetland <steve.hetland@imply.io>
Co-authored-by: sthetland <steve.hetland@imply.io>
* Make LocalInputSource.files a List instead of Set and adjust wikipedia_index_task to use file list.
Rationale: the behavior of wikipedia_index_task.json is order-dependent with regard to its input
files; some orders produce 4 segments and some produce 5 segments. Some integration tests, like
ITSystemTableBatchIndexTaskTest and ITAutoCompactionTest, are written assuming that the
4-segment case will always happen. Providing the file list in a specific order ensures that this
will happen as expected by the tests.
I didn't see a specific reason why the LocalInputSource.files parameter needed to be a Set, so
changing it to a List was the simplest way to achieve the consistent ordering. I think it will
also make the behavior make more sense if someone does specify the same input file multiple
times in a spec: I think they'd expect it to be loaded multiple times instead of deduped. This
is consistent with the behavior of other input sources like S3, GCS, HTTP.
* Sort files in LocalFirehoseFactory.
* add back and deprecate aggregator factory methods so i can say i told you so when i delete these later
* rename to make less ambiguous, fix fill method
* adjust
* Scan: Add "orderBy" parameter.
This patch adds an API for requesting non-time orderings, although it
does not actually add the ability to execute such queries.
The changes are done in such a way that no matter how Scan query objects
are constructed, they will have a correct "getOrderBy". This will enable
us to switch the execution to exclusively use "getOrderBy" later on when
it's implemented.
Scan queries are serialized such that they only include "order" (time
order) if the ordering is time-based, and they only include "orderBy" if
the ordering is non-time-based. This maximizes compatibility with
the existing API while also providing a clean look for formatted queries.
Because this patch does not include execution logic, if someone actually
tries to run a query with non-time ordering, then they will get an error
like "Cannot execute query with orderBy [quality ASC]".
* SQL module fixes.
* Add spotbugs-exclude.
* Remove unused method.
* Add worker category as dimension in TaskSlotCountStatsMonitor
* Change description
* Add workerConfig as field
* Modify HttpRemoteTaskRunnerTest to test worker category in taskslot metrics
* Fixing tests
* Fixing alerts
* Adding unit test in SingleTaskBackgroundRunnerTest for task slot metrics APIs
* Resolving false positive spell check
* addressing comments
* throw UnsupportedOperationException for tasklotmetrics APIs in SingleTaskBackgroundRunner
Co-authored-by: Nikhil Navadiya <nnavadiya@twitter.com>
* Fix bug Unrecognized token 'No': was expecting (JSON String,...) when calling the API /druid/indexer/v1/task/taskId/reports and the report is not found
* Also log other non-OK statuses
* IMPLY-4344: Adding safe divide function along with testcases and documentation updates
* Changing based on review comments
* Addressing review comments, fixing coding style, docs and spelling
* Checkstyle passes for all code
* Fixing expected results for infinity
* Revert "Fixing expected results for infinity"
This reverts commit 5fd5cd480d.
* Updating test result and a space in docs
Unlike a real one, TestServerInventoryView would call segmentRemoved
any time _any_ segment was removed. It should only be called when _all_
segments have been removed.
* Use a simple class to sanitize sanitizable errors and log them
The purpose of this is to sanitize JDBC errors, but can sanitize other errors
if they implement SanitizableError Interface
add a class to log errors and sanitize them
added a simple test that tests out that the error gets sanitized
add @NonNull annotation to serverconfig's ErrorResponseTransfromStrategy
* return less information as part of too many connections, and instead only log specific details
This is so an end user gets relevant information but not too much info since they might now how
many brokers they have
* return only runtime exceptions
added new error types that need to be sanitized
also sanitize deprecated and unsupported exceptions.
* dont reqrewite exceptions unless necessary for checked exceptions
add docs
avoid blanket turning all exceptions into runtime exceptions
* address comments, to fix up docs.
add more javadocs
add support UOE sanitization
* use try catch instead and sanitize at public methods
* checkstyle fixes
* throw noSuchStatement and NoSuchConnection as Avatica is affected by those
* address comments. move log error back to druid meta
clean up bad formatting and commented code. add missed catch for NoSuchStatementException
clean up comments for error handler and add comment explainging not wanting to santize avatica exceptions
* alter test to reflect new error message