Commit Graph

2758 Commits

Author SHA1 Message Date
Gian Merlino 970614875b Fix race where results from an IncrementalIndexSegment could be cached. (#2983) 2016-05-18 13:57:50 +05:30
Charles Allen 15ccf451f9 Move QueryGranularity static fields to QueryGranularities (#2980)
* Move QueryGranularity static fields to QueryGranularityUtil
* Fixes #2979

* Add test showing #2979

* change name to QueryGranularities
2016-05-17 16:23:48 -07:00
Charles Allen eaaad01de7 [QTL] Datasource as lookupTier (#2955)
* Datasource as lookup tier
* Adds an option to let indexing service tasks pull their lookup tier from the datasource they are working for.

* Fix bad docs for lookups lookupTier

* Add Datasource name holder

* Move task and datasource to be pulled from Task file

* Make LookupModule pull from bound dataSource

* Fix test

* Fix code style on imports

* Fix formatting

* Make naming better

* Address code comments about naming
2016-05-17 15:44:42 -07:00
Xavier Léauté e79284da59 new interval based cost function (#2972)
* new interval based cost function

Addresses issues with balancing of segments in the existing cost function
- `gapPenalty` led to clusters of segments ~30 days apart
- `recencyPenalty` caused imbalance among recent segments
- size-based cost could be skewed by compression

New cost function is purely based on segment intervals:
- assumes each time-slice of a partition is a constant cost
- cost is additive, i.e. cost(A, B union C) = cost(A, B) + cost(A, C)
- cost decays exponentially based on distance between time-slices

* comments and formatting

* add more comments to explain the calculation
2016-05-17 09:56:00 -07:00
Parag Jain 681ffdb417 try to make DruidCoordinatorTest deterministic (#2967) 2016-05-13 14:43:28 -07:00
Nishant a9b721a01b Allow user to set cost balancer threads more than or equal to the number of cores. (#2964)
* Allow user to set cost balancer threads more than the number of cores.

Allow user to set cost balancer threads more than the number of cores.

* modify test
2016-05-13 13:27:42 -05:00
Charles Allen 81cab8a7bb Make lookups more idempotent on update requests. (#2954)
* No longer fails if an update fails but it shouldn't have replaced it
2016-05-11 11:22:35 -07:00
Jonathan Wei f2510cf125 Remove DataSchema equals() and hashcode() 2016-05-10 16:09:28 -07:00
Charles Allen 6332bd70f4 Add smile provider (#2951) 2016-05-10 16:03:39 -07:00
Charles Allen 0c04650e69 Lookup Announcer eager starting (#2944) 2016-05-10 12:21:47 +05:30
Charles Allen 454bb034f1 Nicer toString on ListneingAnnouncerConfig (#2936)
* Helps with debugging
2016-05-10 12:21:06 +05:30
David Lim 2cfd337378 Merge pull request #2933 from dclim/SQLMetadataSupervisorManagerTest-fix
add uuid to primary key for supervisor table
2016-05-09 10:41:32 -06:00
Nishant a2dd57cf65 Optimize CostBalancerStrategy (#2910)
* Optimize CostBalancerStrategy

Ignore benchmark test in normal run

fix test

review comments

fix compilation

fix test

* review comments

* review comment
2016-05-05 08:29:08 -07:00
David Lim b489f63698 Supervisor for KafkaIndexTask (#2656)
* supervisor for kafka indexing tasks

* cr changes
2016-05-04 23:13:13 -07:00
binlijin 841be5c61f periodically emit metric segment/scan/pending (#2854) 2016-05-02 22:38:13 -07:00
Himanshu 6c5bf91f9a publish metrics numJettyConns to see how number of active jetty connections change over time (#2839)
this can be compared with numer of active queries to see if requests are waiting in jetty queue
2016-05-02 14:08:25 -07:00
Charles Allen 6b957aa072 [QTL] Make URI Exctraction Namespace take more sane arguments (#2738)
* Make URI Exctraction Namespace take more sane arguments
* Fixes https://github.com/druid-io/druid/issues/2669

* Update docs

* Rename error message

* Undo overzealous deletion of docs

* Explain caching mechanism a bit more in docs
2016-05-02 12:54:34 -07:00
David Lim 5f0a9ccc57 fix ClassCastException in FiniteAppenderatorDriver (#2896) 2016-04-28 18:39:24 -07:00
Parag Jain 0d745ee120 Basic authorization support in Druid (#2424)
- Introduce `AuthorizationInfo` interface, specific implementations of which would be provided by extensions
- If the `druid.auth.enabled` is set to `true` then the `isAuthorized` method of `AuthorizationInfo` will be called to perform authorization checks
-  `AuthorizationInfo` object will be created in the servlet filters of specific extension and will be passed as a request attribute with attribute name as `AuthConfig.DRUID_AUTH_TOKEN`
- As per the scope of this PR, all resources that needs to be secured are divided into 3 types - `DATASOURCE`, `CONFIG` and `STATE`. For any type of resource, possible actions are  - `READ` or `WRITE`
- Specific ResourceFilters are used to perform auth checks for all endpoints that corresponds to a specific resource type. This prevents duplication of logic and need to inject HttpServletRequest inside each endpoint. For example
 - `DatasourceResourceFilter` is used for endpoints where the datasource information is present after "datasources" segment in the request Path such as `/druid/coordinator/v1/datasources/`, `/druid/coordinator/v1/metadata/datasources/`, `/druid/v2/datasources/`
 - `RulesResourceFilter` is used where the datasource information is present after "rules" segment in the request Path such as `/druid/coordinator/v1/rules/`
 - `TaskResourceFilter` is used for endpoints is used where the datasource information is present after "task" segment in the request Path such as `druid/indexer/v1/task`
 - `ConfigResourceFilter` is used for endpoints like `/druid/coordinator/v1/config`, `/druid/indexer/v1/worker`, `/druid/worker/v1` etc
 - `StateResourceFilter` is used for endpoints like `/druid/broker/v1/loadstatus`, `/druid/coordinator/v1/leader`, `/druid/coordinator/v1/loadqueue`, `/druid/coordinator/v1/rules` etc
- For endpoints where a list of resources is returned like `/druid/coordinator/v1/datasources`, `/druid/indexer/v1/completeTasks` etc. the list is filtered to return only the resources to which the requested user has access. In these cases, `HttpServletRequest` instance needs to be injected in the endpoint method.

Note -
JAX-RS specification provides an interface called `SecurityContext`. However, we did not use this but provided our own interface `AuthorizationInfo` mainly because it provides more flexibility. For example, `SecurityContext` has a method called `isUserInRole(String role)` which would be used for auth checks and if used then the mapping of what roles can access what resource needs to be modeled inside Druid either using some convention or some other means which is not very flexible as Druid has dynamic resources like datasources. Fixes #2355 with PR #2424
2016-04-28 16:50:28 -07:00
Nishant c29cb7d711 add pending task based resource management strategy (#2086) 2016-04-27 10:40:53 -07:00
Slim 984a518c9f Merge pull request #2734 from b-slim/LookupIntrospection2
[QTL][Lookup] adding introspection endpoint
2016-04-21 12:15:57 -05:00
Gian Merlino c74391e54c JavaScript: Ability to disable. (#2853)
Fixes #2852.
2016-04-21 09:43:15 -05:00
Xavier Léauté 5938d9085b Stream segments from database (#2859)
* Avoids fetching all segment records into heap by JDBC driver
* Set connection to read-only to help database optimize queries
* Update JDBC drivers (MySQL has fixes for streaming results)
2016-04-21 05:40:07 +08:00
Xavier Léauté fc91120b54 Merge pull request #2857 from metamx/upgrade-zk
upgrade zookeeper client dependency to 3.4.8
2016-04-20 10:36:07 +05:30
Nishant dbf63f738f Add ability to filter segments for specific dataSources on broker without creating tiers (#2848)
* Add back FilteredServerView removed in a32906c7fd to reduce memory usage using watched tiers.

* Add functionality to specify "druid.broker.segment.watchedDataSources"
2016-04-19 10:10:06 -07:00
Gian Merlino 08c784fbf6 KafkaIndexTask: Use a separate sequence per Kafka partition in order to make (#2844)
segment creation deterministic.

This means that each segment will contain data from just one Kafka
partition. So, users will probably not want to have a super high number
of Kafka partitions...

Fixes #2703.
2016-04-18 22:29:52 -07:00
Jisoo Kim 7b65ca7889 refactor ClientQuerySegmentWalker (#2837)
* refactor ClientQuerySegmentWalker

* add header to FluentQueryRunnerBuilder

* refactor QueryRunnerTestHelper
2016-04-18 14:00:47 -07:00
binlijin c1e690288c Improve some log (#2807) 2016-04-15 09:34:26 -07:00
Nishant 632b21472b fix test failure (#2818)
formatting changes
2016-04-14 21:40:19 -07:00
Fangjin Yang 886ee4e30d Merge pull request #2821 from metamx/review-comments-2784
handle review comments for PR 2784
2016-04-12 10:20:43 -07:00
Fangjin Yang b486eff6b7 Merge pull request #2805 from metamx/query-time-start
request log should reflect time the query was received
2016-04-12 09:44:42 -07:00
Nishant deb6ecf919 handle review comments for PR 2784
https://github.com/druid-io/druid/pull/2784#discussion_r59062021
2016-04-12 21:52:00 +05:30
Himanshu Gupta aa6a230c90 remove DruidSQL.g4, its failing with newer version of ANTLR, will bring it back and fix if needed later 2016-04-08 11:46:21 -05:00
Xavier Léauté d4d1d615c1 request log should reflect time the query was received, as opposed to processed 2016-04-07 12:39:34 -07:00
Nishant edd74f2b67 Allow Lite DataSegment Announcements
separate config for each skipping dimensions, metrics and loadSpec

Add test

fix test comment

Add docs
2016-04-07 18:24:12 +05:30
jon-wei 0e481d6f93 Allow filters to use extraction functions 2016-04-05 13:24:56 -07:00
Xavier Léauté 728da75224 remove unused code 2016-04-01 13:10:35 -07:00
Fangjin Yang 9cb197adec Merge pull request #2722 from himanshug/fix_hadoop_jar_upload
config to explicitly specify classpath for hadoop container during hadoop ingestion
2016-03-28 14:49:03 -07:00
Fangjin Yang 7a84c267f7 Merge pull request #2743 from metamx/fixlookuptest
Fix LookupCoordinatorManager and Test for alerts
2016-03-28 14:41:37 -07:00
Parag Jain 89a8277ae2 Merge pull request #2712 from guobingkun/make_runnables_pluggable
make Coordinator IndexingService helpers pluggable
2016-03-28 12:18:18 -05:00
Charles Allen 05151bc325 Fix LookupCoordinatorManagerTest for alerts
* Also fixes bad alerting on missing nodes
2016-03-28 09:41:47 -07:00
Fangjin Yang 3c4691aa5a Merge pull request #2741 from gianm/examples-wiki
Downgrade geoip2, exclude com.google.http-client.
2016-03-25 23:08:38 -07:00
Xavier Léauté 01f3221a62 Merge pull request #2665 from jisookim0513/remove-druid-server-serialization
remove serialization of DruidServer
2016-03-25 15:54:05 -07:00
Gian Merlino 977e867ad8 Downgrade geoip2, exclude com.google.http-client.
Reverts "Update com.maxmind.geoip2 to 2.6.0" and exclude the google http client
from com.maxmind.geoip2. This should satisfy the original need from #2646 (wanting
to run Druid along with an upgraded com.google.http-client) while preventing
Jackson conflicts pointed out in #2717.

Fixes #2717.

This reverts commit 21b7572533.
2016-03-25 14:43:22 -07:00
jisookim 0d3c5a3b6c remove serialization of Druid Server and add tests for ServersResource 2016-03-25 12:27:27 -07:00
Bingkun Guo 0872448ff0 make Coordinator IndexingService helpers pluggable
Fixes #2682
IndexingService helpers are added according to the settings in runtime.properties.
Rather than having all the config.isXXX checks there, it makes sense to have a pluggable
approach for allowing the dynamic configuration to bring in implementations for helpers
without having to have hard-coded sets of available helpers. Plus, it will also make it possible for extensions to plug helpers in.

With https://github.com/druid-io/druid-api/pull/76, we could conditionally bind a helper to Coordinator's runlist.
The condition is driven by the value set in the runtime.properties.
2016-03-25 11:48:54 -05:00
Himanshu Gupta e78a469fb7 UTs for ExtensionsConfig 2016-03-25 10:51:28 -05:00
Himanshu Gupta 004b00bb96 config to explicitly specify classpath for hadoop container during hadoop ingestion 2016-03-25 10:51:28 -05:00
Gian Merlino 713062053c Filters: Add filter.toFilter method, use that instead of the instanceof chain in Filters.
I believe that the instanceof chain in Filters exists because in the past, Filter
and DimFilter were in different packages (DimFilter was in druid-client and Filter
was in druid-processing). And since druid-client didn't depend on druid-processing,
DimFilter couldn't have a toFilter method. But now it can.
2016-03-23 17:03:49 -07:00
kilida f25b2ed6f8 Duplicate statement in ReservoirSegmentSamplerTest.java 2016-03-22 22:14:36 -04:00
Fangjin Yang 826b371259 Merge pull request #2697 from guobingkun/remove_duplicate_version_converter
remove duplicated DruidCoordinatorVersionConverter
2016-03-22 15:48:09 -07:00
Bingkun Guo a6e9ff48ec Merge pull request #2688 from pjain1/props_cli
do not inject properties directly in module
2016-03-22 15:27:19 -05:00
Bingkun Guo 3778adf1f4 remove duplicated DruidCoordinatorVersionConverter 2016-03-22 14:45:52 -05:00
Parag Jain 7b93195dc6 do not inject properties directly in module 2016-03-22 14:30:10 -05:00
Himanshu 00d7021291 Merge pull request #2607 from jon-wei/dim_schema
Support use of DimensionSchema class in DimensionsSpec
2016-03-22 11:53:46 -05:00
Himanshu 3220b109ad Merge pull request #2570 from binlijin/single_dimension_partitioning
Single dimension hash-based partitioning
2016-03-22 11:51:06 -05:00
binlijin bce600f5d5 Single dimension hash-based partitioning 2016-03-22 13:15:33 +08:00
jon-wei a59c9ee1b1 Support use of DimensionSchema class in DimensionsSpec 2016-03-21 13:12:04 -07:00
Xavier Léauté 25967d0ed8 fix servlet startup sequence, fixes #2681 2016-03-18 15:06:15 -07:00
Charles Allen 5da9a280b6 Query Time Lookup - Dynamic Configuration 2016-03-18 09:45:05 -07:00
Charles Allen 45c413af7e Merge pull request #2674 from metamx/fix-broadcast-lockup
separate HTTP client pool for cancellation requests
2016-03-17 15:23:42 -07:00
Xavier Léauté 1718a7224b separate HTTP pool for cancellation requests
* reduces contention between queries and cancellation requests
* more aggressive timeouts for cancellation requests
2016-03-17 12:11:18 -07:00
Charles Allen c716af5b04 Merge pull request #2678 from metamx/fixImports
Fix some google related imports
2016-03-17 11:53:16 -07:00
Charles Allen a52c6d3bee Fix some google related imports 2016-03-17 11:03:29 -07:00
Gian Merlino 738dcd8cd9 Update version to 0.9.1-SNAPSHOT.
Fixes #2462
2016-03-17 10:34:20 -07:00
Parag Jain 948b19a088 do not silently ingnore rows 2016-03-16 09:30:19 -05:00
Fangjin Yang ec949d76e3 Merge pull request #2655 from navis/hint-coordinator-client
Add hint message for missing `druid.selectors.coordinator.serviceName`
2016-03-14 20:57:40 -07:00
Jonathan Wei 5ec5ac92c6 Merge pull request #2382 from himanshug/broker_segment_tier_selection
at broker, if configured, only add segments from specific tiers to the timeline
2016-03-14 16:53:06 -07:00
navis.ryu 83e1d5d7bf Add hint message for missing `druid.selectors.coordinator.serviceName` 2016-03-15 08:39:07 +09:00
Fangjin Yang 06813b510a Merge pull request #2571 from himanshug/gp_by_avoid_sort
avoid sort while doing groupBy merging when possible
2016-03-14 14:46:51 -07:00
Nishant 773d6fe86c Merge pull request #2646 from atomx/update-maxmind
Update com.maxmind.geoip2 to 2.6.0
2016-03-14 11:20:48 -07:00
Himanshu d51a0a0cf4 Merge pull request #2220 from gianm/appenderator-kafka
Appenderators, DataSource metadata, KafkaIndexTask
2016-03-14 13:14:36 -05:00
rasahner 2861e854f0 Merge pull request #2540 from pjain1/remove_kill
Remove extra parameter from deleteDataSourceSpecificInterval endpoint and correct exception message for invalid interval
2016-03-14 11:16:23 -05:00
Erik Dubbelboer 21b7572533 Update com.maxmind.geoip2 to 2.6.0
com.maxmind.geoip2 2.6.0 depends on com.google.http-client 1.15.0-rc (3 years old).
When trying to include other libraries in Druid that require an up to date version of com.google.http-client this causes a problem.
2016-03-12 09:44:00 +00:00
Gian Merlino 187569e702 DataSource metadata.
Geared towards supporting transactional inserts of new segments. This involves an
interface "DataSourceMetadata" that allows combining of partially specified metadata
(useful for partitioned ingestion).

DataSource metadata is stored in a new "dataSource" table.
2016-03-10 17:41:50 -08:00
Gian Merlino 3d2214377d Appenderatoring.
Appenderators are a way of getting more control over the ingestion process
than a Plumber allows. The idea is that existing Plumbers could be implemented
using Appenderators, but you could also implement things that Plumbers can't do.

FiniteAppenderatorDrivers help simplify indexing a finite stream of data.

Also:
- Sink: Ability to consider itself "finished" vs "still writable".
- Sink: Ability to return the number of rows contained within the sink.
2016-03-10 17:41:50 -08:00
Gian Merlino 92c828f904 Make SegmentHandoffNotifier Closeable. 2016-03-10 16:50:37 -08:00
Gian Merlino ad5ffdf483 Nix Committers.supplierOf; Suppliers.ofInstance is good enough. 2016-03-10 16:50:37 -08:00
Gian Merlino 8a11161b20 Plumbers: Move plumber.add out of try/catch for ParseException.
The incremental indexes handle that now so it's not necessary.

Also, add debug logging and more detailed exceptions to the incremental
indexes for the case where there are parse exceptions during aggregation.
2016-03-10 16:39:26 -08:00
Himanshu Gupta 02dfd5cd80 update IncrementalIndex to support unsorted facts map that can be used in groupBy merging to improve performance 2016-03-10 16:11:48 -06:00
Gian Merlino 708bc674fa Make specifying query context booleans more consistent.
Before, some needed to be strings and some needed to be real booleans. Now
they can all be either one.
2016-03-08 19:38:26 -08:00
Fangjin Yang 9c2420a1bc Merge pull request #2599 from himanshug/datasource_isolation
make coordinator db polling for list of segments more robust
2016-03-08 12:43:49 -08:00
Fangjin Yang e7018f524f Merge pull request #2598 from himanshug/handoff_timeout
optional ability to configure handoff wait timeout on realtime tasks
2016-03-08 12:43:36 -08:00
Slim Bouguerra c72438ead0 override metric name 2016-03-08 10:58:12 -06:00
Himanshu Gupta 1288784bde in coordinator db polling for available segments, ignore corrupted entries in segments table so that coordinator continues to load new segments even if there are few corrupted segment entries 2016-03-07 15:13:10 -06:00
Himanshu Gupta 0402636598 configurable handoffConditionTimeout in realtime tasks for segment handoff wait 2016-03-05 10:14:54 -06:00
Fangjin Yang d06c1c5c85 Merge pull request #2583 from guobingkun/fix_multiple_specs_2
update querySegmentSpec when passing query to getQueryRunner
2016-03-02 18:05:34 -08:00
David Lim 9e74772d6b Merge pull request #2574 from gianm/allostuff
Make first few allocatePendingSegment retries quiet.
2016-03-02 16:16:53 -07:00
Bingkun Guo cfe2dbf1eb Merge pull request #2580 from gianm/rtc-basePersist
RealtimeTuningConfig: Use different default basePersistDirectory per instance.
2016-03-02 16:56:49 -06:00
Bingkun Guo 4a58462fc7 update querySegmentSpec when passing query to getQueryRunner
After finding the FireChief for a specific partition, Druid will need to find the specific queryRunner for each segment being queried by passing the query to FireChief. Currently Druid is passing the original query that contains all the segments need to be queried, it's possible that fireChief.getQueryRunner(query) returns more than 1 queryRunner because query.getIntervals() is not specific to a single segment.

In this patch, for each segment being queried, Druid will update the query with its corresponding SpecificSegmentSpec.
2016-03-02 16:44:56 -06:00
Gian Merlino e65e6a49a5 RealtimeTuningConfig: Use different default basePersistDirectory per instance. 2016-03-02 13:57:53 -08:00
Gian Merlino 004028b887 Make first few allocatePendingSegment retries quiet.
Some light retrying can happen during normal operation (SELECT -> INSERT races) and the
ensuing log messages would be scary for users.
2016-03-02 13:40:29 -08:00
Fangjin Yang 612e327426 Merge pull request #2581 from gianm/fix-deadlock
CliPeon: Fix deadlock on startup by eagerly creating ExecutorLifecycle, ChatHandlerResource.
2016-03-02 11:37:49 -08:00
Gian Merlino 7557eb2800 CliPeon: Fix deadlock on startup by eagerly creating ExecutorLifecycle, ChatHandlerResource.
See stack traces here, from current master: https://gist.github.com/gianm/bd9a66c826995f97fc8f

1. The thread "qtp925672150-62" holds the lock on InternalInjectorCreator.class,
   used by Scopes.SINGLETON, and wants the lock on "handlers" in Lifecycle.addMaybeStartHandler
   called by DiscoveryModule.getServiceAnnouncer.
2. The main thread holds the lock on "handlers" in Lifecycle.addMaybeStartHandler, which it
   took because it's trying to add the ExecutorLifecycle to the lifecycle. main is trying
   to get the InternalInjectorCreator.class lock because it's running ExecutorLifecycle.start,
   which does some Jackson deserialization, and Jackson needs that lock in order to inject
   stuff into the Task it's deserializing.

This patch eagerly instantiates ChatHandlerResource (which I believe is what's trying to
create the ServiceAnnouncer in the qtp925672150-62 jetty thread) and the ExecutorLifecycle.
2016-03-02 10:53:42 -08:00
Gian Merlino 102fc92120 SQLMetadataConnector: Fix overzealous retries that were preventing EntryExistsException from making it out. 2016-03-01 17:20:33 -08:00
Fangjin Yang 9340cae985 Merge pull request #2457 from bjozet/docs/fixes
Default value for maxRowsInMemory
2016-03-01 07:43:26 -08:00
Björn Zettergren 2462c82c0e New defaults for maxRowsInMemory rowFlushBoundary
To bring consistency to docs and source this commit changes the default
values for maxRowsInMemory and rowFlushBoundary to 75000 after
discussion in PR https://github.com/druid-io/druid/pull/2457.

The previous default was 500000 and it's lower now on the grounds that
it's better for a default to be somewhat less efficient, and work,
than to reach for the stars and possibly result in
"OutOfMemoryError: java heap space" errors.
2016-03-01 13:50:28 +01:00
Bingkun Guo 4edcb1b861 Refactor FireChief + UTs for RealtimeManagerTest
Add tests that verify whether RealtimeManager is querying the correct FireChief for a specific partition
make FireChief static and package private, add latches in the UT
2016-02-29 14:41:10 -06:00
Eric Tschetter 68631d89e9 Allow realtime nodes to have multiple shards of the same datasource 2016-02-29 12:30:25 -06:00
Parag Jain 6b3c96c63a better exception for invalid interval 2016-02-26 10:02:38 -06:00