druid

Commit Graph

Author	SHA1	Message	Date
Gian Merlino	b13f07a057	Harmonize local input sources; fix batch index integration test. (#11965 ) * Make LocalInputSource.files a List instead of Set and adjust wikipedia_index_task to use file list. Rationale: the behavior of wikipedia_index_task.json is order-dependent with regard to its input files; some orders produce 4 segments and some produce 5 segments. Some integration tests, like ITSystemTableBatchIndexTaskTest and ITAutoCompactionTest, are written assuming that the 4-segment case will always happen. Providing the file list in a specific order ensures that this will happen as expected by the tests. I didn't see a specific reason why the LocalInputSource.files parameter needed to be a Set, so changing it to a List was the simplest way to achieve the consistent ordering. I think it will also make the behavior make more sense if someone does specify the same input file multiple times in a spec: I think they'd expect it to be loaded multiple times instead of deduped. This is consistent with the behavior of other input sources like S3, GCS, HTTP. * Sort files in LocalFirehoseFactory.	2021-11-21 22:26:31 -08:00
Gian Merlino	cb0a2af644	TestKafkaExtractionCluster: Shut down Kafka, ZK in @After. (#11963 )	2021-11-20 15:17:05 -08:00
Frank Chen	2e3767bef0	Use the last ip as docker host ip (#11742 )	2021-11-20 13:31:39 +08:00
Gian Merlino	b3502c3e50	DruidViewMacro: Remove unused escalator field. (#11931 ) * DruidViewMacro: Remove unused escalator field. * Remove additional unused fields.	2021-11-19 16:06:29 -08:00
Clint Wylie	f260bbed23	restore and deprecate AggregatorFactory methods (#11917 ) * add back and deprecate aggregator factory methods so i can say i told you so when i delete these later * rename to make less ambiguous, fix fill method * adjust	2021-11-19 15:59:35 -08:00
Gian Merlino	36ee0367ff	Scan: Add "orderBy" parameter. (#11930 ) * Scan: Add "orderBy" parameter. This patch adds an API for requesting non-time orderings, although it does not actually add the ability to execute such queries. The changes are done in such a way that no matter how Scan query objects are constructed, they will have a correct "getOrderBy". This will enable us to switch the execution to exclusively use "getOrderBy" later on when it's implemented. Scan queries are serialized such that they only include "order" (time order) if the ordering is time-based, and they only include "orderBy" if the ordering is non-time-based. This maximizes compatibility with the existing API while also providing a clean look for formatted queries. Because this patch does not include execution logic, if someone actually tries to run a query with non-time ordering, then they will get an error like "Cannot execute query with orderBy [quality ASC]". * SQL module fixes. * Add spotbugs-exclude. * Remove unused method.	2021-11-19 08:19:12 -08:00
Nikhil Navadiya	3c51136098	Add worker category dimension (#11554 ) * Add worker category as dimension in TaskSlotCountStatsMonitor * Change description * Add workerConfig as field * Modify HttpRemoteTaskRunnerTest to test worker category in taskslot metrics * Fixing tests * Fixing alerts * Adding unit test in SingleTaskBackgroundRunnerTest for task slot metrics APIs * Resolving false positive spell check * addressing comments * throw UnsupportedOperationException for tasklotmetrics APIs in SingleTaskBackgroundRunner Co-authored-by: Nikhil Navadiya <nnavadiya@twitter.com>	2021-11-18 22:59:07 -08:00
Agustin Gonzalez	a4353aa1f4	Fix bug Unrecognized token 'No': was expecting (JSON String,...) when… (#11934 ) * Fix bug Unrecognized token 'No': was expecting (JSON String,...) when calling the API /druid/indexer/v1/task/taskId/reports and the report is not found * Also log other non-OK statuses	2021-11-18 10:29:28 -07:00
Gian Merlino	a04f99a950	Indexer: Demote WARN to DEBUG for tasks that don't register Appenderators. (#11939 )	2021-11-18 07:54:43 -08:00
somu-imply	29710789a4	Adding safe divide function (#11904 ) * IMPLY-4344: Adding safe divide function along with testcases and documentation updates * Changing based on review comments * Addressing review comments, fixing coding style, docs and spelling * Checkstyle passes for all code * Fixing expected results for infinity * Revert "Fixing expected results for infinity" This reverts commit `5fd5cd480d`. * Updating test result and a space in docs	2021-11-17 08:22:41 -08:00
Gian Merlino	d76e646700	Fix TestServerInventoryView behavioral discrepancy. (#11932 ) Unlike a real one, TestServerInventoryView would call segmentRemoved any time _any_ segment was removed. It should only be called when _all_ segments have been removed.	2021-11-16 18:08:35 -08:00
Clint Wylie	7f0bede878	autocompaction support for complex dimensions (#11924 ) * autocompaction support for complex dimensions * more test	2021-11-16 15:57:44 -08:00
Clint Wylie	00c976a3fe	only get bitmap index for string dictionary encoded columns (#11925 )	2021-11-16 15:50:02 -08:00
Clint Wylie	54fead3546	sql skip reduce of complex literal expressions (#11928 )	2021-11-16 15:40:42 -08:00
TSFenwick	1487f558b1	Use a simple class to sanitize JDBC exceptions and also log them (#11843 ) * Use a simple class to sanitize sanitizable errors and log them The purpose of this is to sanitize JDBC errors, but can sanitize other errors if they implement SanitizableError Interface add a class to log errors and sanitize them added a simple test that tests out that the error gets sanitized add @NonNull annotation to serverconfig's ErrorResponseTransfromStrategy * return less information as part of too many connections, and instead only log specific details This is so an end user gets relevant information but not too much info since they might now how many brokers they have * return only runtime exceptions added new error types that need to be sanitized also sanitize deprecated and unsupported exceptions. * dont reqrewite exceptions unless necessary for checked exceptions add docs avoid blanket turning all exceptions into runtime exceptions * address comments, to fix up docs. add more javadocs add support UOE sanitization * use try catch instead and sanitize at public methods * checkstyle fixes * throw noSuchStatement and NoSuchConnection as Avatica is affected by those * address comments. move log error back to druid meta clean up bad formatting and commented code. add missed catch for NoSuchStatementException clean up comments for error handler and add comment explainging not wanting to santize avatica exceptions * alter test to reflect new error message	2021-11-16 13:13:03 -08:00
sthetland	02b578a3dd	Fixing a few typos and style issues (#11883 ) * grammar and format work * light writing touchup Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-11-16 10:13:35 -08:00
William Hyun	3abca73ee8	Upgrade ORC to 1.7.1 (#11919 )	2021-11-15 09:13:03 -08:00
Sandeep	3042c1776c	upgrade app version to 0.22.0 (#11872 ) Co-authored-by: Benedict Jin <asdf2014@apache.org>	2021-11-13 22:44:00 +08:00
Sandeep	400e90dc93	Remove Druid chart deprecation message and flag (#11897 )	2021-11-13 22:38:13 +08:00
Gian Merlino	6f6e88e02e	SQL: Add type headers to response formats. (#11914 ) This allows clients to interpret the results of SQL queries without having to guess types.	2021-11-13 11:30:57 +05:30
Jihoon Son	f91868602d	Remove stale warning for HTTP inputSource (#11907 )	2021-11-13 10:27:14 +08:00
Charles Smith	33a5cda061	Docs: Splits Kafka topic. Adds detailed example for kafka inputFormat (#11912 ) * Splits Kafka topic according to function. Adds detailed example for kafka inputFormat * Apply suggestions from code review accept suggestions from review Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review accept suggestions Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * accept suggestions * accept suggestions * final typos and clarifications * bringing forward some syntax fixes Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2021-11-12 13:02:23 -08:00
Agustin Gonzalez	a13a96d5e0	Avoid materializing list of segment files when finding a partition file during shuffle (#11903 ) * Avoid materializing list of segment files (it can cause OOM/memory pressure) as well as looping over the files. * Validate subTaskId	2021-11-11 10:51:52 -07:00
Kashif Faraz	223c5692a8	Add dimension partitioningType to metrics to track usage of different partitioning schemes (#11902 ) Add method ShardSpec.getType() to get name of shard spec type List all names of shard spec types in the interface ShardSpec itself for easy reference and maintenance Add dimension partitioningType to metric segment/added/bytes	2021-11-11 18:34:27 +05:30
Gian Merlino	fe2f7742f7	Fix incorrect comparison in RowSignature. (#11905 ) PR #11882 introduced a type comparison using ==, but while it was in flight, another PR #11713 changed the type enum to a class. So the comparison should properly be done with "equals".	2021-11-11 04:30:42 -08:00
Laksh Singla	57ed5127a7	Make subquery IDs more comprehensive (#11809 ) There are 3 types of query IDs - id, subQueryId, sqlQueryId. Currently, whenever a query generates subqueries, the subquery's subQueryId is populated randomly. Also, subquery's Id is not set to the parent query Id. Therefore there is no way of linking the subqueries to the parent query, and one loses the ability to look at end to end view of the query. This PR aims to implement following couple of things: Populate the subqueries with it's parent's id (and sqlQueryId if present) Populate the subqueryId such that it forms a hierarchical relationship amongs themselves. For example, if there is a query which launches a subquery, which in turn launches a couple of subqueries, then the ids and subQueryIds should have following structure.	2021-11-11 16:31:56 +05:30
Atul Mohan	f9941c12c3	Reduce list operation calls when pulling segments from S3 (#11899 ) * Lazy lists * Fix objectsummary init	2021-11-10 19:13:46 -08:00
Clint Wylie	5baa22148e	revert ColumnAnalysis type, add typeSignature and use it for DruidSchema (#11895 ) * revert ColumnAnalysis type, add typeSignature and use it for DruidSchema * review stuffs * maybe null * better maybe null * Update docs/querying/segmentmetadataquery.md * Update docs/querying/segmentmetadataquery.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * fix null right * sad * oops * Update batch_hadoop_queries.json Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-11-10 18:46:29 -08:00
TSFenwick	cdd1c2876c	catch throwable because calcite is throwing an error not exception (#11892 ) * catch throwable because calcite is throwing an error not exception * add test case	2021-11-10 17:22:04 -08:00
Jihoon Son	13bec7468a	Fix NPE for SQL queries when a query parameter is missing in the mid (#11900 ) * Fix NPE for SQL queries when a query parameter is missing in the mid * checkstyle * Throw SqlPlanningException instead of IAE	2021-11-10 10:02:26 -08:00
Gian Merlino	14b0b4aee2	RowBasedSegment: Use Sequence instead of Iterable. (#11886 ) * RowBasedSegment: Use Sequence instead of Iterable. The main reason this is good is that Sequences can include baggage that must be closed after iteration is finished. This enables creating RowBasedSegments on top of closeable sequences of rows. To preserve the optimization that allows reversing a List without copying it, this patch also makes SimpleSequence its own class and allows extracting the Iterable that was used to create it. * Fix tests.	2021-11-10 06:06:52 -08:00
Gian Merlino	db4d157be6	Add Finalization option to RowSignature.addAggregators. (#11882 ) * Add Finalization option to RowSignature.addAggregators. This make type signatures more useful when the caller knows whether it will be reading aggregation results in their finalized or intermediate types. * Fix call site.	2021-11-10 06:05:29 -08:00
Kashif Faraz	d3914c1a78	Ensure backward compatibility of multi dimension partitioning (#11889 ) This PR has changes to ensure backward compatibility of multi dimension partitioning such that if some middle managers are upgraded to a newer version, the cluster still functions normally for single_dim use cases.	2021-11-10 10:23:34 +05:30
Clint Wylie	a8805ab60d	add missing json type for ListFilteredVirtualColumn (#11887 ) * add missing json type for ListFilteredVirtualColumn, and tests to try to avoid this happening again * fixes * ugly, but maybe this * oops * too many mappers	2021-11-09 17:25:12 -08:00
Maytas Monsereenusorn	a36a41da73	Support routing data through an HTTP proxy (#11891 ) * Support routing data through an HTTP proxy * Support routing data through an HTTP proxy This adds the ability for the HttpClient to connect through an HTTP proxy. We augment the channel factory to check if it is supposed to be proxied and, if so, we connect to the proxy host first, issue a CONNECT command through to the final recipient host and then give the channel to the normal http client for usage. * add docs * address comments Co-authored-by: imply-cheddar <86940447+imply-cheddar@users.noreply.github.com>	2021-11-09 17:24:06 -08:00
Gian Merlino	6c196a5ea2	Remove StorageAdapter.getColumnTypeName. (#11893 ) * Remove StorageAdapter.getColumnTypeName. It was only used by SegmentAnalyzer, and isn't necessary anymore due to the recent improvements to ColumnCapabilities. Also: tidy ColumnDescriptor.read slightly by removing an instanceof check, and moving the relevant logic into ComplexColumnPartSerde. * Fix spellings.	2021-11-09 15:18:07 -08:00
Gian Merlino	324d4374f6	HashJoinEngine: Fix extraneous advance of left cursor. (#11890 ) This could happen for right or full outer joins in certain cases. Tests weren't catching this because existing Cursor implementations generally ignore extraneous calls to "advance". So, to help catch this in tests, extra state validations are also added to RowWalker, which is used by RowBasedSegment.	2021-11-09 11:34:11 -08:00
Gian Merlino	babf00f8e3	Migrate File.mkdirs to FileUtils.mkdirp. (#11879 ) * Migrate File.mkdirs to FileUtils.mkdirp. * Remove unused imports. * Fix LookupReferencesManager. * Simplify. * Also migrate usages of forceMkdir. * Fix var name. * Fix incorrect call. * Update test.	2021-11-09 11:10:49 -08:00
Gian Merlino	945a341acd	RowBasedCursor: Add column-value-reuse optimization. (#11884 ) * RowBasedCursor: Add column-value-reuse optimization. Most of the logic is in RowBasedColumnSelectorFactory, although in this patch its only user is RowBasedCursor. This improves performance of features that use RowBasedSegment, like lookup and inline datasources. It's especially helpful for inline datasources that contain lengthy arrays, due to the fact that the transformed array can be reused. * Changes from code review. * Fixes for ColumnCapabilitiesImplTest.	2021-11-09 07:18:09 -08:00
Maytas Monsereenusorn	ddc68c6a81	Support changing dimension schema in Auto Compaction (#11874 ) * add impl * add unit tests * fix checkstyle * add impl * add impl * add impl * add impl * add impl * add impl * fix test * add IT * add IT * fix docs * add test * address comments * fix conflict	2021-11-08 21:17:08 -08:00
Gian Merlino	a5bd0b8cc0	RowAdapter: Add a default implementation for timestampFunction. (#11885 ) Enables simpler implementations for adapters that want to treat the timestamp as "just another column".	2021-11-08 10:25:13 -08:00
Clint Wylie	7237dc837c	complex typed expressions (#11853 ) * complex typed expressions * add built-in hll collector expressions to get coverage on druid-processing, more types, more better * rampage!!! * more javadoc * adjustments * oops * lol * remove unused dependency * contradiction? * more test	2021-11-08 00:33:06 -08:00
Jian Wang	8e7e679984	Add more metrics for Jetty server thread pool usage (#11113 ) Add more metrics for jetty server thread pool usage so we know if we have allocated enough http threads to handle requests.	2021-11-07 16:51:44 +05:30
Kashif Faraz	2d77e1a3c6	Add support for multi dimension range partitioning (#11848 ) This PR adds support for range partitioning on multiple dimensions. It extends on the concept and implementation of single dimension range partitioning. The new partition type added is range which corresponds to a set of Dimension Range Partition classes. single_dim is now treated as a range type partition with a single partition dimension. The start and end values of a DimensionRangeShardSpec are represented by StringTuples, where each String in the tuple is the value of a partition dimension.	2021-11-06 12:50:17 +05:30
Gian Merlino	1c12dd97dc	Add javadocs to StringUtils.fromUtf8. (#11881 ) They clarify that the methods advance the position of the buffer.	2021-11-05 15:27:24 -07:00
Gian Merlino	8971056763	Properly count segment references in tests. (#11870 )	2021-11-05 12:49:10 -07:00
Clint Wylie	907e4ca0c5	use correct DimensionSpec with for column value selectors created from dictionary encoded column indexers (#11873 ) * use correct dimension spec for column value selectors of dictionary encoded column indexers	2021-11-05 01:51:15 -07:00
zachjsh	1d6df48145	Warn if cache size of lookup is beyond max size (#11863 ) Enhanced the ExtractionNamespace interface in lookups-cached-global core extension with the ability to set a maxHeapPercentage for the cache of the respective namespace. The reason for adding this functionality, is make it easier to detect when a lookup table grows to a size that the underlying service cannot handle, because it does not have enough memory. The default value of maxHeap for the interface is -1, which indicates that no maxHeapPercentage has been set. For the JdbcExtractionNamespace and UriExtractionNamespace implementations, the default value is null, which will cause the respective service that the lookup is loaded in, to warn when its cache is beyond mxHeapPercentage of the service's configured max heap size. If a positive non-null value is set for the namespace's maxHeapPercentage config, this value will be honored for all services that the respective lookup is loaded onto, and consequently log warning messages when the cache of the respective lookup grows beyond this respective percentage of the services configured max heap size. Warnings are logged every time that either Uri based or Jdbc based lookups are regenerated, if the maxHeapPercentage constraint is violated. No other implementations will log warnings at this time. No error is thrown when the size exceeds the maxHeapPercentage at this time, as doing so could break functionality for existing users. Previously the JdbcCacheGenerator generated its cache by materializing all rows of the underling table in memory at once; this made it difficult to log warning messages in the case that the results from the jdbc query were very large and caused the service to run out of memory. To help with this, this pr makes it so that the jdbc query results are instead streamed through an iterator.	2021-11-03 21:32:22 -04:00
Abhishek Agarwal	652e1491e0	Update default values for tuning parameters in kinesis data loader (#11867 )	2021-11-02 23:51:28 +05:30
Karan Kumar	cf27366b35	Fixing typos in docker build scripts (#11866 )	2021-11-02 23:50:52 +05:30

... 12 13 14 15 16 ...

12018 Commits All Branches Search

12018 Commits

All Branches