Currently advance function in postJoinCursor calls advanceUninterruptibly which in turn keeps calling baseCursor.advanceUninterruptibly until the post join condition matches, without checking for interrupts. This causes the CPU to hit 100% without getting a chance for query to be cancelled.
With this change, the call flow of advance and advanceUninterruptibly is separated out so that they call baseCursor.advance and baseCursor.advanceUninterruptibly in them, respectively, giving a chance for interrupts in the former case between successive calls to baseCursor.advance.
* Fix error assuming a Complex Type that is a Number is a double
In the case where a complex type is a number, it may not be castable to double. It can safely be case as Number first to get to the doubleValue.
- adds a new query build path: DruidQuery#toScanAndSortQuery which:
- builds a ScanQuery without considering the current ordering
- builds an operator to execute the sort
- fixes a null string to "null" literal string conversion in the frame serializer code
- fixes some DrillWindowQueryTest cases
- fix NPE in NaiveSortOperator in case there was no input
- enables back CoreRules.AGGREGATE_REMOVE
- adds a processing level OffsetLimit class and uses that instead of just the limit in the rac parts
- earlier window expressions on top of a subquery with an offset may have ignored the offset
Currently the inter Druid communication via rest endpoints is based on json formatted payload. Upon parsing error, there is only a generic exception stating expected json token type and current json token type. There is no detailed error log about the content of the payload causing the violation.
In the micro-service world, the trend is to deploy the Druid servers in k8 with the mesh network. Often the istio proxy or other proxies is used to intercept the network connection between Druid servers. The proxy may give error messages for various reasons. These error messages are not expected by the json parser. The generic error message from Druid can be very misleading as the user may think the message is based on the response from the other Druid server.
For example, this is an example of mysterious error message
QueryInterruptedException{msg=Next token wasn't a START_ARRAY, was[VALUE_STRING] from url[http://xxxxx:8088/druid/v2/], code=Unknown exception, class=org.apache.druid.java.util.common.IAE, host=xxxxx:8088}"
While the context of the message is the following from the proxy when it can't tunnel the network connection.
pstream connect error or disconnect/reset before header
So this very simple PR is just to enhance the logging and get the real underlying message printed out. This would save a lot of head scratching time if Druid is deployed with mesh network.
Co-authored-by: Kai Sun <kai.sun@salesforce.com>
* provide function name when unknown exceptions are encountered
* fix keywords/etc
* fix keywrod order - regex excercise
* add test
* add check&fix keywords
* decoupledIgnore
* Revert "decoupledIgnore"
This reverts commit e922c820a7.
* unpatch Function
* move to a different location
* checkstyle
* Ability to send task types to k8s or worker task runner
* add more tests
* use runnerStrategy to determine task runner
* minor refine
* refine runner strategy config
* move workerType config to upper level
* validate config when application start
* Update S3 retry logic based on the underlying cause in case of IOException.
4xx and other errors wrapped in IOException for instance aren't retriable.
* Fix CI
for some exotic queries like:
SELECT
'_'||dim1,
MIN(cast(0 as double)) OVER (),
MIN(cast((cnt||cnt) as bigint)) OVER ()
FROM foo
the compilation have resulted in NPE -s mostly because VirtualColumn -s were not handled properly
A SegmentTransactionReplaceAction must only update the mapping of tasks with append locks that are running concurrently. To ensure this, we return the supervisor id only if it has the taskLockType as APPEND in its context.
This PR:
adds a flag to JsonToParquet to do the fix during conversion
updates the json files to more correct conents
some resultset mismatches were fixed by this
updates parquet to 1.13.1
* add native filters for "(filter) is true" and "(filter) is false"
changes:
* add IsTrueDimFilter, IsFalseDimFilter, and abstract IsBooleanDimFilter for native json filter implementations of `(filter) IS TRUE` and `(filter) IS FALSE`
* add IsBooleanFilter for actual filtering logic for these filters, which ignore includeUnknown to always use matches with false for true and !matches with true for false
* fix test incorrectly adjusted to wrong answer in #15058
* add tests for default value mode
Patch adds an undocumented parameter taskLockType to MSQ so that we can start enabling this feature for users who are interested in testing the new lock types.
* Separate k8s and druid task lifecycles
* Remove extra log lines
* Fix unit tests
* fix unit tests
* Fix unit tests
* notify listeners on task completion
* Fix unit test
* unused var
* PR changes
* Fix unit tests
* Fix checkstyle
* PR changes
This PR addresses a bug with waiting for segments to be loaded. In the case of append, segments would be created with the same version. This caused the number of segments returned to be incorrect.
This PR changes this to keep track of the range of partition numbers as well for each version, which lets the task wait for the correct set of segments. The partition numbers are expected to be continuous since the task obtains the lock for the segment while running.