2839 Commits

Author SHA1 Message Date
Parag Jain
fd798d32bc fix testSecuredGetServer ut (#3262) 2016-07-20 10:20:13 -07:00
Gian Merlino
06624c40c0 Share query handling between Appenderator and RealtimePlumber. (#3248)
Fixes inconsistent metric handling between the two implementations. Formerly,
RealtimePlumber only emitted query/segmentAndCache/time and query/wait and
Appenderator only emitted query/partial/time and query/wait (all per sink).

Now they both do the same thing:
- query/segmentAndCache/time, query/segment/time are the time spent per sink.
- query/cpu/time is the CPU time spent per query.
- query/wait/time is the executor waiting time per sink.

These generally match historical metrics, except segmentAndCache & segment
mean the same thing here, because one Sink may be partially cached and
partially uncached and we aren't splitting that out.
2016-07-19 22:15:13 -05:00
Himanshu
3f82108d15 optionally enable coordinator auto kill tasks on all dataSources via dynamic config (#3250) 2016-07-17 18:47:52 -07:00
Gian Merlino
6cd1f5375b Better harmonized dimensions for query metrics. (#3245)
All query metrics now start with toolChest.makeMetricBuilder, and all of
*those* now start with DruidMetrics.makePartialQueryTimeMetric. Also, "id"
moved to common code, since all query metrics added it anyway.

In particular this will add query-type specific dimensions like "threshold"
and "numDimensions" to servlet-originated metrics like query/time.
2016-07-14 11:55:51 -07:00
Hyukjin Kwon
55e7a52475 Replace deprecated usage for StringInputRowParser and JSONParseSpec (#3215) 2016-07-14 09:19:17 -07:00
Nishant
a1715c8cda fix-3237 (#3244)
DruidBroker use FilteredServerInventoryView instead of
ServerInventoryView
2016-07-13 22:30:35 -07:00
Charles Allen
a931debf79 Optionally intern ServerInventoryView inventory objects. (#3238) 2016-07-14 08:49:26 +05:30
Charles Allen
5d9fd0a713 Migrate IndexerSQLMetadataStorageCoordinator.getUnusedSegmentsForInterval to streaming (#3043)
* Migrate IndexerSQLMetadataStorageCoordinator.getUnusedSegmentsForInterval to streaming
* Missed query from #2859

* Make inReadOnlyTransaction part of SQLMetadataConnector
2016-07-06 16:55:27 -07:00
Himanshu
e1313e4b90 add log msg when event recvr firehose buffer is full (#3209) 2016-07-01 17:35:30 -05:00
Xavier Léauté
485e381387 remove datasource from hadoop output path (#3196)
fixes #2083, follow-up to #1702
2016-06-29 08:53:45 -07:00
Gian Merlino
4c9aeb7353 Revert "update druid console version (#3189)" (#3203)
This reverts commit 496b801bc39bec26a4084d6332a1da676091a5dd.
2016-06-29 08:29:57 -07:00
Xavier Léauté
496b801bc3 update druid console version (#3189) 2016-06-27 18:02:40 -07:00
Hyukjin Kwon
45f553fc28 Replace the deprecated usage of NoneShardSpec (#3166) 2016-06-25 10:27:25 -07:00
Gian Merlino
4cc39b2ee7 Alternative groupBy strategy. (#2998)
This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.

Both of these are described in more detail in #2987.

There are two goals of this patch:

1. Make it possible for historical/realtime nodes to return larger groupBy
   result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
   columns, avoiding materialization.

This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
2016-06-24 18:06:09 -07:00
Dave Li
8a08398977 Add segment pruning based on secondary partition dimension (#2982)
* add get dimension rangeset to filters

* add get domain to ShardSpec and added chunk filter in caching clustered client

* add null check and modified not filter, started with unit test

* add filter test with caching

* refactor and some comments

* extract filtershard to helper function

* fixup

* minor changes

* update javadoc
2016-06-24 14:52:19 -07:00
Charles Allen
15f833a861 Make extension classloader caching keyed on directory (#3165)
* Make extension classloaders keyed by extension directory
* Fixes #3163

* Add in same-directory-name unit test
2016-06-23 17:13:19 -07:00
michaelschiff
66d8ad36d7 adds new coordinator metrics 'segment/unavailable/count' and (#3176)
'segment/underReplicated/count' (#3173)
2016-06-23 14:53:15 -07:00
Gian Merlino
ebf890fe79 Update master version to 0.9.2-SNAPSHOT. (#3133) 2016-06-13 13:10:38 -07:00
Nishant
0d427923c0 fix caching for search results (#3119)
* fix caching for search results

properly read count when reading from cache.

* fix NPE during merging search count and add test

* Update cache key to invalidate prev results
2016-06-09 17:49:47 -07:00
Himanshu
ab4209c82a killDataSourceWhitelist in CoordinatorDynamicConfig accepts comma separated list of strings in addition to json array of strings so that coordinator console can do the updates correctly (#3095) 2016-06-07 15:39:41 -07:00
Keuntae Park
e6b32c24ae bug fix for getNewNodes() in ListenerDiscoverer (#3092) 2016-06-07 16:32:42 +05:30
Gian Merlino
2db5f49f35 Fix JavaScriptConfig. (#3062) 2016-06-02 23:59:00 -07:00
Charles Allen
bbc5509078 Limit number of jetty threads that can be used by lookups (#3068) 2016-06-02 22:33:12 -07:00
Slim
545cdd63ab ensure the cleaning of overshadowed unloaded segments (#3048)
* ensure the cleaning of overshadowed unloaded segments

* add testing plus comments
2016-06-02 09:03:58 -05:00
Gian Merlino
874a0a4bdd MetadataResource: Fix handling of includeDisabled. (#3042) 2016-06-01 11:56:37 -07:00
John Wang
e662efa79f segment interface refactor for proposal 2965 (#2990) 2016-05-26 20:36:41 -07:00
John Wang
a004f1e1c5 appenderator plumber work from gian's branch (#2913) 2016-05-26 14:46:32 -07:00
Charles Allen
847501a939 Add better messages around LookupCoordinatorManager failures (#3027)
* Add better messages around LookupCoordinatorManager failures
* Catches #3026

* A few more little tests

* Add more forceful shutdown
2016-05-26 14:32:35 -05:00
jaehong choi
e2653a8cf4 handle a NPE in LookupCoordinatorManager.start() (#3026) 2016-05-26 09:55:33 -07:00
David Lim
3ef24c03b3 Validate X-Druid-Task-Id header in request/response and support retrying on outdated TaskLocation information, add KafkaIndexTaskClient unit tests (#3006)
* validate X-Druid-Task-Id header in request and add header to response

* modify KafkaIndexTaskClient to take a TaskLocationProvider as the TaskLocation may not remain constant
2016-05-25 22:05:18 -07:00
Nishant
0ac1b27d53 Allow manually setting of shutoffTime for EventReceiverFirehose (#2803)
* Allow dynamically setting of shutoffTime for EventReceiverFirehose

Allow dynamically setting shutoffTime for EventReceiverFirehose

review comments and tests

* shut down exec on close
2016-05-24 07:24:00 -07:00
Gian Merlino
970614875b Fix race where results from an IncrementalIndexSegment could be cached. (#2983) 2016-05-18 13:57:50 +05:30
Charles Allen
15ccf451f9 Move QueryGranularity static fields to QueryGranularities (#2980)
* Move QueryGranularity static fields to QueryGranularityUtil
* Fixes #2979

* Add test showing #2979

* change name to QueryGranularities
2016-05-17 16:23:48 -07:00
Charles Allen
eaaad01de7 [QTL] Datasource as lookupTier (#2955)
* Datasource as lookup tier
* Adds an option to let indexing service tasks pull their lookup tier from the datasource they are working for.

* Fix bad docs for lookups lookupTier

* Add Datasource name holder

* Move task and datasource to be pulled from Task file

* Make LookupModule pull from bound dataSource

* Fix test

* Fix code style on imports

* Fix formatting

* Make naming better

* Address code comments about naming
2016-05-17 15:44:42 -07:00
Xavier Léauté
e79284da59 new interval based cost function (#2972)
* new interval based cost function

Addresses issues with balancing of segments in the existing cost function
- `gapPenalty` led to clusters of segments ~30 days apart
- `recencyPenalty` caused imbalance among recent segments
- size-based cost could be skewed by compression

New cost function is purely based on segment intervals:
- assumes each time-slice of a partition is a constant cost
- cost is additive, i.e. cost(A, B union C) = cost(A, B) + cost(A, C)
- cost decays exponentially based on distance between time-slices

* comments and formatting

* add more comments to explain the calculation
2016-05-17 09:56:00 -07:00
Parag Jain
681ffdb417 try to make DruidCoordinatorTest deterministic (#2967) 2016-05-13 14:43:28 -07:00
Nishant
a9b721a01b Allow user to set cost balancer threads more than or equal to the number of cores. (#2964)
* Allow user to set cost balancer threads more than the number of cores.

Allow user to set cost balancer threads more than the number of cores.

* modify test
2016-05-13 13:27:42 -05:00
Charles Allen
81cab8a7bb Make lookups more idempotent on update requests. (#2954)
* No longer fails if an update fails but it shouldn't have replaced it
2016-05-11 11:22:35 -07:00
Jonathan Wei
f2510cf125 Remove DataSchema equals() and hashcode() 2016-05-10 16:09:28 -07:00
Charles Allen
6332bd70f4 Add smile provider (#2951) 2016-05-10 16:03:39 -07:00
Charles Allen
0c04650e69 Lookup Announcer eager starting (#2944) 2016-05-10 12:21:47 +05:30
Charles Allen
454bb034f1 Nicer toString on ListneingAnnouncerConfig (#2936)
* Helps with debugging
2016-05-10 12:21:06 +05:30
David Lim
2cfd337378 Merge pull request #2933 from dclim/SQLMetadataSupervisorManagerTest-fix
add uuid to primary key for supervisor table
2016-05-09 10:41:32 -06:00
Nishant
a2dd57cf65 Optimize CostBalancerStrategy (#2910)
* Optimize CostBalancerStrategy

Ignore benchmark test in normal run

fix test

review comments

fix compilation

fix test

* review comments

* review comment
2016-05-05 08:29:08 -07:00
David Lim
b489f63698 Supervisor for KafkaIndexTask (#2656)
* supervisor for kafka indexing tasks

* cr changes
2016-05-04 23:13:13 -07:00
binlijin
841be5c61f periodically emit metric segment/scan/pending (#2854) 2016-05-02 22:38:13 -07:00
Himanshu
6c5bf91f9a publish metrics numJettyConns to see how number of active jetty connections change over time (#2839)
this can be compared with numer of active queries to see if requests are waiting in jetty queue
2016-05-02 14:08:25 -07:00
Charles Allen
6b957aa072 [QTL] Make URI Exctraction Namespace take more sane arguments (#2738)
* Make URI Exctraction Namespace take more sane arguments
* Fixes https://github.com/druid-io/druid/issues/2669

* Update docs

* Rename error message

* Undo overzealous deletion of docs

* Explain caching mechanism a bit more in docs
2016-05-02 12:54:34 -07:00
David Lim
5f0a9ccc57 fix ClassCastException in FiniteAppenderatorDriver (#2896) 2016-04-28 18:39:24 -07:00
Parag Jain
0d745ee120 Basic authorization support in Druid (#2424)
- Introduce `AuthorizationInfo` interface, specific implementations of which would be provided by extensions
- If the `druid.auth.enabled` is set to `true` then the `isAuthorized` method of `AuthorizationInfo` will be called to perform authorization checks
-  `AuthorizationInfo` object will be created in the servlet filters of specific extension and will be passed as a request attribute with attribute name as `AuthConfig.DRUID_AUTH_TOKEN`
- As per the scope of this PR, all resources that needs to be secured are divided into 3 types - `DATASOURCE`, `CONFIG` and `STATE`. For any type of resource, possible actions are  - `READ` or `WRITE`
- Specific ResourceFilters are used to perform auth checks for all endpoints that corresponds to a specific resource type. This prevents duplication of logic and need to inject HttpServletRequest inside each endpoint. For example
 - `DatasourceResourceFilter` is used for endpoints where the datasource information is present after "datasources" segment in the request Path such as `/druid/coordinator/v1/datasources/`, `/druid/coordinator/v1/metadata/datasources/`, `/druid/v2/datasources/`
 - `RulesResourceFilter` is used where the datasource information is present after "rules" segment in the request Path such as `/druid/coordinator/v1/rules/`
 - `TaskResourceFilter` is used for endpoints is used where the datasource information is present after "task" segment in the request Path such as `druid/indexer/v1/task`
 - `ConfigResourceFilter` is used for endpoints like `/druid/coordinator/v1/config`, `/druid/indexer/v1/worker`, `/druid/worker/v1` etc
 - `StateResourceFilter` is used for endpoints like `/druid/broker/v1/loadstatus`, `/druid/coordinator/v1/leader`, `/druid/coordinator/v1/loadqueue`, `/druid/coordinator/v1/rules` etc
- For endpoints where a list of resources is returned like `/druid/coordinator/v1/datasources`, `/druid/indexer/v1/completeTasks` etc. the list is filtered to return only the resources to which the requested user has access. In these cases, `HttpServletRequest` instance needs to be injected in the endpoint method.

Note -
JAX-RS specification provides an interface called `SecurityContext`. However, we did not use this but provided our own interface `AuthorizationInfo` mainly because it provides more flexibility. For example, `SecurityContext` has a method called `isUserInRole(String role)` which would be used for auth checks and if used then the mapping of what roles can access what resource needs to be modeled inside Druid either using some convention or some other means which is not very flexible as Druid has dynamic resources like datasources. Fixes #2355 with PR #2424
2016-04-28 16:50:28 -07:00