druid

Commit Graph

Author	SHA1	Message	Date
Laksh Singla	b5a25f24f2	Improve the DruidRexExecutor w.r.t handling of numeric arrays (#11968 ) DruidRexExecutor while reducing Arrays, specially numeric arrays, doesn't convert the value from ExprResult's type to BigDecimal, which causes makeLiteral to cast the values. Also, if NaN or Infinite values are present in the array, the error is a generic NumberFormatException. For example: SELECT ARRAY[1.11, 2.22] returns [1, 2] SELECT SQRT(-1) throws a generic NumberFormatException instead of IAE This PR introduces change to cast the numeric values to BigDecimal since Calcite's library understands that easily, and doesn't perform casts.	2021-11-23 11:40:59 +05:30
Peter Marshall	ed0606db69	Docs - Corrected admonition issue (#11926 ) * Corrected admonition issue * Update data-formats.md Removed all admonition bits, and took out sf linebreaks. * Update data-formats.md Changed the shocker line into something a little more practical.	2021-11-22 12:14:30 -08:00
Katya Macedo	706d057ccc	corrected leaderlatch name (#11966 )	2021-11-22 11:58:42 -08:00
Gian Merlino	35b610ada7	QueryableIndexColumnSelectorFactory: Double-check cached column class. (#11957 ) Important because an earlier call to getCachedColumn may have been done with a different class, leading to a ClassCastException on the second call. In the prior code, this could happen if a complex column had makeDimensionSelector called on it after makeColumnValueSelector had already been called.	2021-11-22 11:31:24 -08:00
Gian Merlino	d6507c9428	PrioritizedExecutorService: Properly wrap on direct calls to "execute". (#11956 ) Usually, "execute" is called by methods defined in the superclass AbstractExecutorService, and the passed-in Runnable has been wrapped by newTaskFor inside a PrioritizedListenableFutureTask. But this method can also be called directly, and if so, the same wrapping is necessary for the delegate to get a Runnable that can be entered into a priority queue with the others.	2021-11-22 10:30:12 -08:00
TSFenwick	a4cb1de87a	get rid of class cast exception and add a new testcase for that issue (#11951 )	2021-11-22 08:44:20 -08:00
jacobtolar	0a9a908031	Add inline native query example to tutorial (#11642 ) * Add inline native query example to tutorial Minor change to the tutorial that adds an example of a native HTTP query request body, and adds a link to the more detailed "native query over HTTP" documentation. * cleanup * Apply suggestions from code review. Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: sthetland <steve.hetland@imply.io>	2021-11-22 21:35:05 +08:00
Sandeep	b1de56a3be	update Druid Chart README doc and removes unnecessary lock file (#11945 ) * update Druid Chart README doc and removes unnecessary lock file * update Druid Chart README doc and removes unnecessary lock file	2021-11-22 21:34:26 +08:00
XIAO WANG	f1cf1c8f39	update count distinct tests (#11927 ) Co-authored-by: wangxiao060 <wangxiao060@ke.com>	2021-11-22 21:34:00 +08:00
Peter Marshall	0c0001579d	Update compaction.md (#11937 ) Removed superfluous tabs that caused issues in rendering Added nav to the `inputSpec`	2021-11-22 21:33:47 +08:00
jacobtolar	3aee5d9ec3	Fix: invalid JSON in ingestion spec doc example (#11880 ) * Fix: invalid JSON in ingestion spec doc example * Update ingestion-spec.md	2021-11-22 21:33:26 +08:00
Frank Chen	e77938b205	Add thread count to pre-push hook to speed up checking (#11808 ) * Add thread count to accelerate checking * add comment	2021-11-22 21:33:01 +08:00
Frank Chen	cfd60f1222	Improve README for integration test (#11860 ) * Optimize IT readme * Resolve comments	2021-11-22 21:32:36 +08:00
Gian Merlino	b13f07a057	Harmonize local input sources; fix batch index integration test. (#11965 ) * Make LocalInputSource.files a List instead of Set and adjust wikipedia_index_task to use file list. Rationale: the behavior of wikipedia_index_task.json is order-dependent with regard to its input files; some orders produce 4 segments and some produce 5 segments. Some integration tests, like ITSystemTableBatchIndexTaskTest and ITAutoCompactionTest, are written assuming that the 4-segment case will always happen. Providing the file list in a specific order ensures that this will happen as expected by the tests. I didn't see a specific reason why the LocalInputSource.files parameter needed to be a Set, so changing it to a List was the simplest way to achieve the consistent ordering. I think it will also make the behavior make more sense if someone does specify the same input file multiple times in a spec: I think they'd expect it to be loaded multiple times instead of deduped. This is consistent with the behavior of other input sources like S3, GCS, HTTP. * Sort files in LocalFirehoseFactory.	2021-11-21 22:26:31 -08:00
Gian Merlino	cb0a2af644	TestKafkaExtractionCluster: Shut down Kafka, ZK in @After. (#11963 )	2021-11-20 15:17:05 -08:00
Frank Chen	2e3767bef0	Use the last ip as docker host ip (#11742 )	2021-11-20 13:31:39 +08:00
Gian Merlino	b3502c3e50	DruidViewMacro: Remove unused escalator field. (#11931 ) * DruidViewMacro: Remove unused escalator field. * Remove additional unused fields.	2021-11-19 16:06:29 -08:00
Clint Wylie	f260bbed23	restore and deprecate AggregatorFactory methods (#11917 ) * add back and deprecate aggregator factory methods so i can say i told you so when i delete these later * rename to make less ambiguous, fix fill method * adjust	2021-11-19 15:59:35 -08:00
Gian Merlino	36ee0367ff	Scan: Add "orderBy" parameter. (#11930 ) * Scan: Add "orderBy" parameter. This patch adds an API for requesting non-time orderings, although it does not actually add the ability to execute such queries. The changes are done in such a way that no matter how Scan query objects are constructed, they will have a correct "getOrderBy". This will enable us to switch the execution to exclusively use "getOrderBy" later on when it's implemented. Scan queries are serialized such that they only include "order" (time order) if the ordering is time-based, and they only include "orderBy" if the ordering is non-time-based. This maximizes compatibility with the existing API while also providing a clean look for formatted queries. Because this patch does not include execution logic, if someone actually tries to run a query with non-time ordering, then they will get an error like "Cannot execute query with orderBy [quality ASC]". * SQL module fixes. * Add spotbugs-exclude. * Remove unused method.	2021-11-19 08:19:12 -08:00
Nikhil Navadiya	3c51136098	Add worker category dimension (#11554 ) * Add worker category as dimension in TaskSlotCountStatsMonitor * Change description * Add workerConfig as field * Modify HttpRemoteTaskRunnerTest to test worker category in taskslot metrics * Fixing tests * Fixing alerts * Adding unit test in SingleTaskBackgroundRunnerTest for task slot metrics APIs * Resolving false positive spell check * addressing comments * throw UnsupportedOperationException for tasklotmetrics APIs in SingleTaskBackgroundRunner Co-authored-by: Nikhil Navadiya <nnavadiya@twitter.com>	2021-11-18 22:59:07 -08:00
Agustin Gonzalez	a4353aa1f4	Fix bug Unrecognized token 'No': was expecting (JSON String,...) when… (#11934 ) * Fix bug Unrecognized token 'No': was expecting (JSON String,...) when calling the API /druid/indexer/v1/task/taskId/reports and the report is not found * Also log other non-OK statuses	2021-11-18 10:29:28 -07:00
Gian Merlino	a04f99a950	Indexer: Demote WARN to DEBUG for tasks that don't register Appenderators. (#11939 )	2021-11-18 07:54:43 -08:00
somu-imply	29710789a4	Adding safe divide function (#11904 ) * IMPLY-4344: Adding safe divide function along with testcases and documentation updates * Changing based on review comments * Addressing review comments, fixing coding style, docs and spelling * Checkstyle passes for all code * Fixing expected results for infinity * Revert "Fixing expected results for infinity" This reverts commit `5fd5cd480d`. * Updating test result and a space in docs	2021-11-17 08:22:41 -08:00
Gian Merlino	d76e646700	Fix TestServerInventoryView behavioral discrepancy. (#11932 ) Unlike a real one, TestServerInventoryView would call segmentRemoved any time _any_ segment was removed. It should only be called when _all_ segments have been removed.	2021-11-16 18:08:35 -08:00
Clint Wylie	7f0bede878	autocompaction support for complex dimensions (#11924 ) * autocompaction support for complex dimensions * more test	2021-11-16 15:57:44 -08:00
Clint Wylie	00c976a3fe	only get bitmap index for string dictionary encoded columns (#11925 )	2021-11-16 15:50:02 -08:00
Clint Wylie	54fead3546	sql skip reduce of complex literal expressions (#11928 )	2021-11-16 15:40:42 -08:00
TSFenwick	1487f558b1	Use a simple class to sanitize JDBC exceptions and also log them (#11843 ) * Use a simple class to sanitize sanitizable errors and log them The purpose of this is to sanitize JDBC errors, but can sanitize other errors if they implement SanitizableError Interface add a class to log errors and sanitize them added a simple test that tests out that the error gets sanitized add @NonNull annotation to serverconfig's ErrorResponseTransfromStrategy * return less information as part of too many connections, and instead only log specific details This is so an end user gets relevant information but not too much info since they might now how many brokers they have * return only runtime exceptions added new error types that need to be sanitized also sanitize deprecated and unsupported exceptions. * dont reqrewite exceptions unless necessary for checked exceptions add docs avoid blanket turning all exceptions into runtime exceptions * address comments, to fix up docs. add more javadocs add support UOE sanitization * use try catch instead and sanitize at public methods * checkstyle fixes * throw noSuchStatement and NoSuchConnection as Avatica is affected by those * address comments. move log error back to druid meta clean up bad formatting and commented code. add missed catch for NoSuchStatementException clean up comments for error handler and add comment explainging not wanting to santize avatica exceptions * alter test to reflect new error message	2021-11-16 13:13:03 -08:00
sthetland	02b578a3dd	Fixing a few typos and style issues (#11883 ) * grammar and format work * light writing touchup Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-11-16 10:13:35 -08:00
William Hyun	3abca73ee8	Upgrade ORC to 1.7.1 (#11919 )	2021-11-15 09:13:03 -08:00
Sandeep	3042c1776c	upgrade app version to 0.22.0 (#11872 ) Co-authored-by: Benedict Jin <asdf2014@apache.org>	2021-11-13 22:44:00 +08:00
Sandeep	400e90dc93	Remove Druid chart deprecation message and flag (#11897 )	2021-11-13 22:38:13 +08:00
Gian Merlino	6f6e88e02e	SQL: Add type headers to response formats. (#11914 ) This allows clients to interpret the results of SQL queries without having to guess types.	2021-11-13 11:30:57 +05:30
Jihoon Son	f91868602d	Remove stale warning for HTTP inputSource (#11907 )	2021-11-13 10:27:14 +08:00
Charles Smith	33a5cda061	Docs: Splits Kafka topic. Adds detailed example for kafka inputFormat (#11912 ) * Splits Kafka topic according to function. Adds detailed example for kafka inputFormat * Apply suggestions from code review accept suggestions from review Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review accept suggestions Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * accept suggestions * accept suggestions * final typos and clarifications * bringing forward some syntax fixes Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2021-11-12 13:02:23 -08:00
Agustin Gonzalez	a13a96d5e0	Avoid materializing list of segment files when finding a partition file during shuffle (#11903 ) * Avoid materializing list of segment files (it can cause OOM/memory pressure) as well as looping over the files. * Validate subTaskId	2021-11-11 10:51:52 -07:00
Kashif Faraz	223c5692a8	Add dimension partitioningType to metrics to track usage of different partitioning schemes (#11902 ) Add method ShardSpec.getType() to get name of shard spec type List all names of shard spec types in the interface ShardSpec itself for easy reference and maintenance Add dimension partitioningType to metric segment/added/bytes	2021-11-11 18:34:27 +05:30
Gian Merlino	fe2f7742f7	Fix incorrect comparison in RowSignature. (#11905 ) PR #11882 introduced a type comparison using ==, but while it was in flight, another PR #11713 changed the type enum to a class. So the comparison should properly be done with "equals".	2021-11-11 04:30:42 -08:00
Laksh Singla	57ed5127a7	Make subquery IDs more comprehensive (#11809 ) There are 3 types of query IDs - id, subQueryId, sqlQueryId. Currently, whenever a query generates subqueries, the subquery's subQueryId is populated randomly. Also, subquery's Id is not set to the parent query Id. Therefore there is no way of linking the subqueries to the parent query, and one loses the ability to look at end to end view of the query. This PR aims to implement following couple of things: Populate the subqueries with it's parent's id (and sqlQueryId if present) Populate the subqueryId such that it forms a hierarchical relationship amongs themselves. For example, if there is a query which launches a subquery, which in turn launches a couple of subqueries, then the ids and subQueryIds should have following structure.	2021-11-11 16:31:56 +05:30
Atul Mohan	f9941c12c3	Reduce list operation calls when pulling segments from S3 (#11899 ) * Lazy lists * Fix objectsummary init	2021-11-10 19:13:46 -08:00
Clint Wylie	5baa22148e	revert ColumnAnalysis type, add typeSignature and use it for DruidSchema (#11895 ) * revert ColumnAnalysis type, add typeSignature and use it for DruidSchema * review stuffs * maybe null * better maybe null * Update docs/querying/segmentmetadataquery.md * Update docs/querying/segmentmetadataquery.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * fix null right * sad * oops * Update batch_hadoop_queries.json Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-11-10 18:46:29 -08:00
TSFenwick	cdd1c2876c	catch throwable because calcite is throwing an error not exception (#11892 ) * catch throwable because calcite is throwing an error not exception * add test case	2021-11-10 17:22:04 -08:00
Jihoon Son	13bec7468a	Fix NPE for SQL queries when a query parameter is missing in the mid (#11900 ) * Fix NPE for SQL queries when a query parameter is missing in the mid * checkstyle * Throw SqlPlanningException instead of IAE	2021-11-10 10:02:26 -08:00
Gian Merlino	14b0b4aee2	RowBasedSegment: Use Sequence instead of Iterable. (#11886 ) * RowBasedSegment: Use Sequence instead of Iterable. The main reason this is good is that Sequences can include baggage that must be closed after iteration is finished. This enables creating RowBasedSegments on top of closeable sequences of rows. To preserve the optimization that allows reversing a List without copying it, this patch also makes SimpleSequence its own class and allows extracting the Iterable that was used to create it. * Fix tests.	2021-11-10 06:06:52 -08:00
Gian Merlino	db4d157be6	Add Finalization option to RowSignature.addAggregators. (#11882 ) * Add Finalization option to RowSignature.addAggregators. This make type signatures more useful when the caller knows whether it will be reading aggregation results in their finalized or intermediate types. * Fix call site.	2021-11-10 06:05:29 -08:00
Kashif Faraz	d3914c1a78	Ensure backward compatibility of multi dimension partitioning (#11889 ) This PR has changes to ensure backward compatibility of multi dimension partitioning such that if some middle managers are upgraded to a newer version, the cluster still functions normally for single_dim use cases.	2021-11-10 10:23:34 +05:30
Clint Wylie	a8805ab60d	add missing json type for ListFilteredVirtualColumn (#11887 ) * add missing json type for ListFilteredVirtualColumn, and tests to try to avoid this happening again * fixes * ugly, but maybe this * oops * too many mappers	2021-11-09 17:25:12 -08:00
Maytas Monsereenusorn	a36a41da73	Support routing data through an HTTP proxy (#11891 ) * Support routing data through an HTTP proxy * Support routing data through an HTTP proxy This adds the ability for the HttpClient to connect through an HTTP proxy. We augment the channel factory to check if it is supposed to be proxied and, if so, we connect to the proxy host first, issue a CONNECT command through to the final recipient host and then give the channel to the normal http client for usage. * add docs * address comments Co-authored-by: imply-cheddar <86940447+imply-cheddar@users.noreply.github.com>	2021-11-09 17:24:06 -08:00
Gian Merlino	6c196a5ea2	Remove StorageAdapter.getColumnTypeName. (#11893 ) * Remove StorageAdapter.getColumnTypeName. It was only used by SegmentAnalyzer, and isn't necessary anymore due to the recent improvements to ColumnCapabilities. Also: tidy ColumnDescriptor.read slightly by removing an instanceof check, and moving the relevant logic into ComplexColumnPartSerde. * Fix spellings.	2021-11-09 15:18:07 -08:00
Gian Merlino	324d4374f6	HashJoinEngine: Fix extraneous advance of left cursor. (#11890 ) This could happen for right or full outer joins in certain cases. Tests weren't catching this because existing Cursor implementations generally ignore extraneous calls to "advance". So, to help catch this in tests, extra state validations are also added to RowWalker, which is used by RowBasedSegment.	2021-11-09 11:34:11 -08:00

... 4 5 6 7 8 ...

11631 Commits All Branches Search

11631 Commits

All Branches