Commit Graph

2644 Commits

Author SHA1 Message Date
Gian Merlino 738dcd8cd9 Update version to 0.9.1-SNAPSHOT.
Fixes #2462
2016-03-17 10:34:20 -07:00
Parag Jain 948b19a088 do not silently ingnore rows 2016-03-16 09:30:19 -05:00
Fangjin Yang ec949d76e3 Merge pull request #2655 from navis/hint-coordinator-client
Add hint message for missing `druid.selectors.coordinator.serviceName`
2016-03-14 20:57:40 -07:00
Jonathan Wei 5ec5ac92c6 Merge pull request #2382 from himanshug/broker_segment_tier_selection
at broker, if configured, only add segments from specific tiers to the timeline
2016-03-14 16:53:06 -07:00
navis.ryu 83e1d5d7bf Add hint message for missing `druid.selectors.coordinator.serviceName` 2016-03-15 08:39:07 +09:00
Fangjin Yang 06813b510a Merge pull request #2571 from himanshug/gp_by_avoid_sort
avoid sort while doing groupBy merging when possible
2016-03-14 14:46:51 -07:00
Nishant 773d6fe86c Merge pull request #2646 from atomx/update-maxmind
Update com.maxmind.geoip2 to 2.6.0
2016-03-14 11:20:48 -07:00
Himanshu d51a0a0cf4 Merge pull request #2220 from gianm/appenderator-kafka
Appenderators, DataSource metadata, KafkaIndexTask
2016-03-14 13:14:36 -05:00
rasahner 2861e854f0 Merge pull request #2540 from pjain1/remove_kill
Remove extra parameter from deleteDataSourceSpecificInterval endpoint and correct exception message for invalid interval
2016-03-14 11:16:23 -05:00
Erik Dubbelboer 21b7572533 Update com.maxmind.geoip2 to 2.6.0
com.maxmind.geoip2 2.6.0 depends on com.google.http-client 1.15.0-rc (3 years old).
When trying to include other libraries in Druid that require an up to date version of com.google.http-client this causes a problem.
2016-03-12 09:44:00 +00:00
Gian Merlino 187569e702 DataSource metadata.
Geared towards supporting transactional inserts of new segments. This involves an
interface "DataSourceMetadata" that allows combining of partially specified metadata
(useful for partitioned ingestion).

DataSource metadata is stored in a new "dataSource" table.
2016-03-10 17:41:50 -08:00
Gian Merlino 3d2214377d Appenderatoring.
Appenderators are a way of getting more control over the ingestion process
than a Plumber allows. The idea is that existing Plumbers could be implemented
using Appenderators, but you could also implement things that Plumbers can't do.

FiniteAppenderatorDrivers help simplify indexing a finite stream of data.

Also:
- Sink: Ability to consider itself "finished" vs "still writable".
- Sink: Ability to return the number of rows contained within the sink.
2016-03-10 17:41:50 -08:00
Gian Merlino 92c828f904 Make SegmentHandoffNotifier Closeable. 2016-03-10 16:50:37 -08:00
Gian Merlino ad5ffdf483 Nix Committers.supplierOf; Suppliers.ofInstance is good enough. 2016-03-10 16:50:37 -08:00
Gian Merlino 8a11161b20 Plumbers: Move plumber.add out of try/catch for ParseException.
The incremental indexes handle that now so it's not necessary.

Also, add debug logging and more detailed exceptions to the incremental
indexes for the case where there are parse exceptions during aggregation.
2016-03-10 16:39:26 -08:00
Himanshu Gupta 02dfd5cd80 update IncrementalIndex to support unsorted facts map that can be used in groupBy merging to improve performance 2016-03-10 16:11:48 -06:00
Gian Merlino 708bc674fa Make specifying query context booleans more consistent.
Before, some needed to be strings and some needed to be real booleans. Now
they can all be either one.
2016-03-08 19:38:26 -08:00
Fangjin Yang 9c2420a1bc Merge pull request #2599 from himanshug/datasource_isolation
make coordinator db polling for list of segments more robust
2016-03-08 12:43:49 -08:00
Fangjin Yang e7018f524f Merge pull request #2598 from himanshug/handoff_timeout
optional ability to configure handoff wait timeout on realtime tasks
2016-03-08 12:43:36 -08:00
Slim Bouguerra c72438ead0 override metric name 2016-03-08 10:58:12 -06:00
Himanshu Gupta 1288784bde in coordinator db polling for available segments, ignore corrupted entries in segments table so that coordinator continues to load new segments even if there are few corrupted segment entries 2016-03-07 15:13:10 -06:00
Himanshu Gupta 0402636598 configurable handoffConditionTimeout in realtime tasks for segment handoff wait 2016-03-05 10:14:54 -06:00
Fangjin Yang d06c1c5c85 Merge pull request #2583 from guobingkun/fix_multiple_specs_2
update querySegmentSpec when passing query to getQueryRunner
2016-03-02 18:05:34 -08:00
David Lim 9e74772d6b Merge pull request #2574 from gianm/allostuff
Make first few allocatePendingSegment retries quiet.
2016-03-02 16:16:53 -07:00
Bingkun Guo cfe2dbf1eb Merge pull request #2580 from gianm/rtc-basePersist
RealtimeTuningConfig: Use different default basePersistDirectory per instance.
2016-03-02 16:56:49 -06:00
Bingkun Guo 4a58462fc7 update querySegmentSpec when passing query to getQueryRunner
After finding the FireChief for a specific partition, Druid will need to find the specific queryRunner for each segment being queried by passing the query to FireChief. Currently Druid is passing the original query that contains all the segments need to be queried, it's possible that fireChief.getQueryRunner(query) returns more than 1 queryRunner because query.getIntervals() is not specific to a single segment.

In this patch, for each segment being queried, Druid will update the query with its corresponding SpecificSegmentSpec.
2016-03-02 16:44:56 -06:00
Gian Merlino e65e6a49a5 RealtimeTuningConfig: Use different default basePersistDirectory per instance. 2016-03-02 13:57:53 -08:00
Gian Merlino 004028b887 Make first few allocatePendingSegment retries quiet.
Some light retrying can happen during normal operation (SELECT -> INSERT races) and the
ensuing log messages would be scary for users.
2016-03-02 13:40:29 -08:00
Fangjin Yang 612e327426 Merge pull request #2581 from gianm/fix-deadlock
CliPeon: Fix deadlock on startup by eagerly creating ExecutorLifecycle, ChatHandlerResource.
2016-03-02 11:37:49 -08:00
Gian Merlino 7557eb2800 CliPeon: Fix deadlock on startup by eagerly creating ExecutorLifecycle, ChatHandlerResource.
See stack traces here, from current master: https://gist.github.com/gianm/bd9a66c826995f97fc8f

1. The thread "qtp925672150-62" holds the lock on InternalInjectorCreator.class,
   used by Scopes.SINGLETON, and wants the lock on "handlers" in Lifecycle.addMaybeStartHandler
   called by DiscoveryModule.getServiceAnnouncer.
2. The main thread holds the lock on "handlers" in Lifecycle.addMaybeStartHandler, which it
   took because it's trying to add the ExecutorLifecycle to the lifecycle. main is trying
   to get the InternalInjectorCreator.class lock because it's running ExecutorLifecycle.start,
   which does some Jackson deserialization, and Jackson needs that lock in order to inject
   stuff into the Task it's deserializing.

This patch eagerly instantiates ChatHandlerResource (which I believe is what's trying to
create the ServiceAnnouncer in the qtp925672150-62 jetty thread) and the ExecutorLifecycle.
2016-03-02 10:53:42 -08:00
Gian Merlino 102fc92120 SQLMetadataConnector: Fix overzealous retries that were preventing EntryExistsException from making it out. 2016-03-01 17:20:33 -08:00
Fangjin Yang 9340cae985 Merge pull request #2457 from bjozet/docs/fixes
Default value for maxRowsInMemory
2016-03-01 07:43:26 -08:00
Björn Zettergren 2462c82c0e New defaults for maxRowsInMemory rowFlushBoundary
To bring consistency to docs and source this commit changes the default
values for maxRowsInMemory and rowFlushBoundary to 75000 after
discussion in PR https://github.com/druid-io/druid/pull/2457.

The previous default was 500000 and it's lower now on the grounds that
it's better for a default to be somewhat less efficient, and work,
than to reach for the stars and possibly result in
"OutOfMemoryError: java heap space" errors.
2016-03-01 13:50:28 +01:00
Bingkun Guo 4edcb1b861 Refactor FireChief + UTs for RealtimeManagerTest
Add tests that verify whether RealtimeManager is querying the correct FireChief for a specific partition
make FireChief static and package private, add latches in the UT
2016-02-29 14:41:10 -06:00
Eric Tschetter 68631d89e9 Allow realtime nodes to have multiple shards of the same datasource 2016-02-29 12:30:25 -06:00
Parag Jain 6b3c96c63a better exception for invalid interval 2016-02-26 10:02:38 -06:00
Fangjin Yang 29d29ba98d Merge pull request #2263 from jon-wei/flex_dims3
Allow IncrementalIndex to store Long/Float dimensions
2016-02-25 17:23:02 -08:00
Gian Merlino b331fb4a83 Fix parsing of druid.indexer.server.maxChatRequests. 2016-02-25 14:47:15 -08:00
Parag Jain b82b487f20 remove extra kill parameter 2016-02-24 17:16:18 -06:00
jon-wei c17ce02467 Allow IncrementalIndex to store Long/Float dimensions 2016-02-24 13:51:57 -08:00
Himanshu Gupta a3b37e9225 In persistAndMerge, increase the scope of try-catch block so that any exception while persisting hydrants is caught and consequently that sink is abandoned or the task will forever wait for handoff to happen. 2016-02-23 22:22:33 -06:00
Nishant 6c9e1a28ad Merge pull request #2519 from gianm/unparseable-handling
Better handling of ParseExceptions.
2016-02-24 04:46:29 +05:30
Fangjin Yang 93540c0631 Merge pull request #2503 from gianm/jetty-qos
Add druid.indexer.server.maxChatRequests for QoS; deprecate separate ports.
2016-02-23 10:35:53 -08:00
Gian Merlino 3534483433 Better handling of ParseExceptions.
Two changes:
- Allow IncrementalIndex to suppress ParseExceptions on "aggregate".
- Add "reportParseExceptions" option to realtime tuning configs. By default this is "false".

Behavior of the counters should now be:

- processed: Number of rows indexed, including rows where some fields could be parsed and some could not.
- thrownAway: Number of rows thrown away due to rejection policy.
- unparseable: Number of rows thrown away due to being completely unparseable (no fields salvageable at all).

If "reportParseExceptions" is true then "unparseable" will always be zero (because a parse error would
cause an exception to be thrown). In addition, "processed" will only include fully parseable rows
(because even partial parse failures will cause exceptions to be thrown).

Fixes #2510.
2016-02-23 10:11:43 -08:00
Fangjin Yang 0c984f9e32 Merge pull request #2109 from himanshug/segments_in_delta_ingestion
idempotent batch delta ingestion
2016-02-22 14:00:45 -08:00
Fangjin Yang 3bdd757024 Merge pull request #1773 from b-slim/log_details
Adding downstream source when throwing QueryInterruptedException
2016-02-22 10:16:07 -08:00
Himanshu Gupta 21b0b8a07d new coordinator endpoint to get list of used segment given a dataSource and list of intervals 2016-02-21 23:17:58 -06:00
Slim Bouguerra 77925cc061 adding downstream source of QueryInterruptedException 2016-02-20 13:05:14 -06:00
Gian Merlino 23c993c9e7 Add druid.indexer.server.maxChatRequests for QoS; deprecate separate ports.
- Add druid.indexer.server.maxChatRequests, which sets up a QoSFilter on the main Jetty server.
- Deprecate druid.indexer.runner.separateIngestionEndpoint
- Deprecate druid.indexer.server.chathandler.*
2016-02-19 13:36:09 -08:00
Gian Merlino 243ac5399b Harmonize realtime indexing loop across the task and standalone nodes.
- Both now catch ParseExceptions on plumber.add (see https://groups.google.com/d/topic/druid-user/wmiRDvx2RvM/discussion)
- Standalone now treats IndexSizeExceededException as fatal (previously only the task did)
2016-02-19 07:34:15 -08:00