druid

Commit Graph

Author	SHA1	Message	Date
Jonathan Wei	364bf9d1f9	Fix non org.apache.druid files and add package name checkstyle rule (#6367 ) * Fix non org.apache.druid files and add package name checkstyle rule * PR comment	2018-09-21 17:58:19 -07:00
QiuMM	255214cbe6	correct variable name in KafkaSupervisor (#6354 )	2018-09-20 16:22:03 -07:00
Jonathan Wei	8972244c68	Mutual TLS support (#6076 ) * Mutual TLS support * Kafka test fixes * TeamCity fix * Split integration tests * Use localhost DOCKER_IP * Increase server thread count * Increase SSL handshake timeouts * Add broken pipe retries, use injected client config params * PR comments, Rat license check exclusion	2018-09-19 09:56:15 -07:00
Joshua Sun	4fafc2ccc9	fixes race condition in kafkasupervisor (#6304 ) * fixes race condition in kafkasupervisor * async verify checkpoints * fixes race condition in kafkasupervisor * replace commonly used methods with variables * remove countdownlatch import * reformat * fixes	2018-09-18 22:37:22 -07:00
Roman Leventov	0c4bd2b57b	Prohibit some Random usage patterns (#6226 ) * Prohibit Random usage patterns * Fix FlattenJSONBenchmarkUtil	2018-09-14 13:35:51 -07:00
Roman Leventov	d50b69e6d4	Prohibit LinkedList (#6112 ) * Prohibit LinkedList * Fix tests * Fix * Remove unused import	2018-09-13 18:07:06 -07:00
Clint Wylie	91a37c692d	'suspend' and 'resume' support for supervisors (kafka indexing service, materialized views) (#6234 ) * 'suspend' and 'resume' support for kafka indexing service changes: * introduces `SuspendableSupervisorSpec` interface to describe supervisors which support suspend/resume functionality controlled through the `SupervisorManager`, which will gracefully shutdown the supervisor and it's tasks, update it's `SupervisorSpec` with either a suspended or running state, and update with the toggled spec. Spec updates are provided by `SuspendableSupervisorSpec.createSuspendedSpec` and `SuspendableSupervisorSpec.createRunningSpec` respectively. * `KafkaSupervisorSpec` extends `SuspendableSupervisorSpec` and now supports suspend/resume functionality. The difference in behavior between 'running' and 'suspended' state is whether the supervisor will attempt to ensure that indexing tasks are or are not running respectively. Behavior is identical otherwise. * `SupervisorResource` now provides `/druid/indexer/v1/supervisor/{id}/suspend` and `/druid/indexer/v1/supervisor/{id}/resume` which are used to suspend/resume suspendable supervisors * Deprecated `/druid/indexer/v1/supervisor/{id}/shutdown` and moved it's functionality to `/druid/indexer/v1/supervisor/{id}/terminate` since 'shutdown' is ambiguous verbage for something that effectively stops a supervisor forever * Added ability to get all supervisor specs from `/druid/indexer/v1/supervisor` by supplying the 'full' query parameter `/druid/indexer/v1/supervisor?full` which will return a list of json objects of the form `{"id":<id>, "spec":<SupervisorSpec>}` * Updated overlord console ui to enable suspend/resume, and changed 'shutdown' to 'terminate' * move overlord console status to own column in supervisor table so does not look like garbage * spacing * padding * other kind of spacing * fix rebase fail * fix more better * all supervisors now suspendable, updated materialized view supervisor to support suspend, more tests * fix log	2018-09-13 14:42:18 -07:00
Gian Merlino	d6cbdf86c2	Broker backpressure. (#6313 ) * Broker backpressure. Adds a new property "druid.broker.http.maxQueuedBytes" and a new context parameter "maxQueuedBytes". Both represent a maximum number of bytes queued per query before exerting backpressure on the channel to the data server. Fixes #4933. * Fix query context doc.	2018-09-10 09:33:29 -07:00
Clint Wylie	e6e068ce60	Add support for 'maxTotalRows' to incremental publishing kafka indexing task and appenderator based realtime task (#6129 ) * resolves #5898 by adding maxTotalRows to incremental publishing kafka index task and appenderator based realtime indexing task, as available in IndexTask * address review comments * changes due to review * merge fail	2018-09-07 13:17:49 -07:00
Jonathan Wei	60cbc64472	Use PasswordProvider, fix info on initial passwords in basic security extension docs (#6303 ) * Fix info on initial passwords in basic security extension docs * Use PasswordProvider * Compile fix	2018-09-05 17:07:16 -07:00
Jonathan Wei	d0fb83760e	Fix PostgreSQLConnectorConfig binding (#6273 )	2018-08-31 14:18:29 -07:00
Dayue Gao	951b36e2bc	BytesFullResponseHandler should only consume readableBytes of ChannelBuffer (#6270 )	2018-08-30 20:22:08 -07:00
Gian Merlino	431d3d8497	Rename io.druid to org.apache.druid. (#6266 ) * Rename io.druid to org.apache.druid. * Fix META-INF files and remove some benchmark results. * MonitorsConfig update for metrics package migration. * Reorder some dimensions in inner queries for some reason. * Fix protobuf tests.	2018-08-30 09:56:26 -07:00
Jonathan Wei	c9a27e3e8e	Don't let catch/finally suppress main exception in IncrementalPublishingKafkaIndexTaskRunner (#6258 )	2018-08-28 16:12:02 -07:00
Jihoon Son	bda5a8a95e	Fix NPE in KafkaSupervisor.checkpointTaskGroup (#6206 ) * Fix NPE in KafkaSupervisor.checkpointTaskGroup * address comments * address comment	2018-08-26 22:23:33 -07:00
Jihoon Son	64d33eef7e	Fix timeout in KafkaSupervisorTest.testCheckpointForInactiveTaskGroup (#6207 ) * Fix timeout in KafkaSupervisorTest.testCheckpointForInactiveTaskGroup * fix npe * add taskRunner.getRunningTasks()	2018-08-26 19:59:01 -06:00
Gian Merlino	28e6ae3664	SQL: Finalize aggregations for inner queries when necessary. (#6221 ) * SQL: Finalize aggregations for inner queries when necessary. Fixes #5779. * Fixed test method name.	2018-08-25 13:56:23 -07:00
Ryan Plessner	9c500fb69f	Add PostgreSQLConnectorConfig to expose SSL configuration options (#6181 ) * Add PostgreSQLConnectorConfig to expose SSL configuration options for the Postgres Metadata Storage module. * Fix checkstyle violations and add license header * Convert properties in the postgres docs to be the full property path and fix typo * Fix grammar in sslFactory docs	2018-08-21 16:45:27 -07:00
Benedict Jin	3647d4c94a	Make time-related variables more readable (#6158 ) * Make time-related variables more readable * Patch some improvements from the code reviewer * Remove unnecessary boxing of Long type variables	2018-08-21 15:29:40 -07:00
Benedict Jin	7d4b2d51e8	Fix assertionError at testCheckpointForInactiveTaskGroup in KafkaSupervisorTest (#6192 )	2018-08-21 11:33:45 -07:00
Jihoon Son	2bfe1b6a5a	Fix NPE for taskGroupId when rolling update (#6168 ) * Fix NPE for taskGroupId * missing changes * fix wrong annotation * fix potential race * keep baseSequenceName * make deprecated old param	2018-08-17 10:15:45 -07:00
Gian Merlino	4d2ff0f6c7	Serde test for JdbcExtractionNamespace. (#6186 )	2018-08-17 11:54:06 -04:00
Gian Merlino	5ce3185b9c	Fix three bugs with segment publishing. (#6155 ) * Fix three bugs with segment publishing. 1. In AppenderatorImpl: always use a unique path if requested, even if the segment was already pushed. This is important because if we don't do this, it causes the issue mentioned in #6124. 2. In IndexerSQLMetadataStorageCoordinator: Fix a bug that could cause it to return a "not published" result instead of throwing an exception, when there was one metadata update failure, followed by some random exception. This is done by resetting the AtomicBoolean that tracks what case we're in, each time the callback runs. 3. In BaseAppenderatorDriver: Only kill segments if we get an affirmative false publish result. Skip killing if we just got some exception. The reason for this is that we want to avoid killing segments if they are in an unknown state. Two other changes to clarify the contracts a bit and hopefully prevent future bugs: 1. Return SegmentPublishResult from TransactionalSegmentPublisher, to make it more similar to announceHistoricalSegments. 2. Make it explicit, at multiple levels of javadocs, that a "false" publish result must indicate that the publish _definitely_ did not happen. Unknown states must be exceptions. This helps BaseAppenderatorDriver do the right thing. * Remove javadoc-only import. * Updates. * Fix test. * Fix tests.	2018-08-15 13:55:53 -07:00
Alexander Saydakov	c47032d566	Implemented makeAggregateCombiner() in ArrayOfDoublesSketchAggregatorFactory (#6093 ) * implemented makeAggregateCombiner() * test for makeAggregateCombiner() * license, style fix	2018-08-13 14:19:11 -07:00
Jihoon Son	a7ca4589dd	Fix race in testCheckpointForUnknownTaskGroup() of KafkaSupervisorTest (#6153 )	2018-08-11 08:26:46 -07:00
Jihoon Son	ecee3e0a24	Further optimize memory for Travis jobs (#6150 ) * Further optimize memory for Travis jobs * fix build * sudo false	2018-08-10 22:03:36 -07:00
Gian Merlino	3525d4059e	Cache: Add maxEntrySize config, make groupBy cacheable by default. (#5108 ) * Cache: Add maxEntrySize config. The idea is this makes it more feasible to cache query types that can potentially generate large result sets, like groupBy and select, without fear of writing too much to the cache per query. Includes a refactor of cache population code in CachingQueryRunner and CachingClusteredClient, such that they now use the same CachePopulator interface with two implementations: one for foreground and one for background. The main reason for splitting the foreground / background impls is that the foreground impl can have a more effective implementation of maxEntrySize. It can stop retaining subvalues for the cache early. * Add CachePopulatorStats. * Fix whitespace. * Fix docs. * Fix various tests. * Add tests. * Fix tests. * Better tests * Remove conflict markers. * Fix licenses.	2018-08-07 10:23:15 -07:00
Jihoon Son	56ab4363ea	Native parallel batch indexing without shuffle (#5492 ) * Native parallel indexing without shuffle * fix build * fix ci * fix ingestion without intervals * fix retry * fix retry * add it test * use chat handler * fix build * add docs * fix ITUnionQueryTest * fix failures * disable metrics reporting * working * Fix split of static-s3 firehose * Add endpoints to supervisor task and a unit test for endpoints * increase timeout in test * Added doc * Address comments * Fix overlapping locks * address comments * Fix static s3 firehose * Fix test * fix build * fix test * fix typo in docs * add missing maxBytesInMemory to doc * address comments * fix race in test * fix test * Rename to ParallelIndexSupervisorTask * fix teamcity * address comments * Fix license * addressing comments * addressing comments * indexTaskClient-based segmentAllocator instead of CountingActionBasedSegmentAllocator * Fix race in TaskMonitor and move HTTP endpoints to supervisorTask from runner * Add more javadocs * use StringUtils.nonStrictFormat for logging * fix typo and remove unused class * fix tests * change package * fix strict build * tmp * Fix overlord api according to the recent change in master * Fix it test	2018-08-06 23:59:42 -07:00
Jihoon Son	ef2d6e9118	Fix IllegalArgumentException in TaskLockBox.syncFromStorage() when updating from 0.12.x to 0.12.2 (#6086 ) * Fix TaskLockBox.syncFromStorage() when updating from 0.12.x to 0.12.2 * Make the priority of taskLock nullable * fix test * fix build	2018-08-03 17:13:44 -07:00
Nishant Bangarwa	75c8a87ce1	Part 2 of changes for SQL Compatible Null Handling (#5958 ) * Part 2 of changes for SQL Compatible Null Handling * Review comments - break lines longer than 120 characters * review comments * review comments * fix license * fix test failure * fix CalciteQueryTest failure * Null Handling - Review comments * review comments * review comments * fix checkstyle * fix checkstyle * remove unrelated change * fix test failure * fix failing test * fix travis failures * Make StringLast and StringFirst aggregators nullable and fix travis failures	2018-08-02 08:20:25 -07:00
Roman Leventov	0754d78a2e	Prohibit Lists.newArrayList() with a single argument (#6068 ) * Prohibit Lists.newArrayList() with a single argument * Test fixes * Add Javadoc to Node constructor	2018-07-31 20:09:10 -07:00
Stas Sukhanov	b0ecfee1ab	Fix ClassNotFoundException in druid-kerberos extension (#4776 ) Class org.apache.hadoop.conf.Configuration inside extensions should be used with caution. By default, the configuration uses the context class loader of the current thread set to the class loader used to load the application. Because of isolation between the application and extensions we must explicitely set the class loader to extension class loader to be able load classes specified in hadoop configuration file.	2018-07-27 16:23:09 -07:00
Benedict Jin	331a0afb98	Remove redundant type parameters and enforce some other style and inspection rules (#5980 ) * Various changes about druid-services module * Patch improvements from reviewer * Add ToArrayCallWithZeroLengthArrayArgument & ArraysAsListWithZeroOrOneArgument into inspection profile * Fix ArraysAsListWithZeroOrOneArgument * Fix conflict * Fix ToArrayCallWithZeroLengthArrayArgument * Fix AliEqualsAvoidNull * Remove blank line * Remove unused import clauses * Fix code style in TopNQueryRunnerTest * Fix conflict * Don't use Collections.singletonList when converting the type of array type * Add argLine into maven-surefire-plugin in druid-process module & increase the timeout value for testMoveSegment testcase * Roll back the latest commit * Add java.io.File#toURL() into druid-forbidden-apis * Using Boolean.parseBoolean instead of Boolean.valueOf for CliCoordinator#isOverlord * Add a new regexp element into stylecode xml file * Fix style error for new regexp * Set the level of ArraysAsListWithZeroOrOneArgument as WARNING * Fix style error for new regexp * Add option BY_LEVEL for ToArrayCallWithZeroLengthArrayArgument in inspection profile * Roll back the level as ToArrayCallWithZeroLengthArrayArgument as ERROR * Add toArray(new Object[0]) regexp into checkstyle config file & fix them * Set the level of ArraysAsListWithZeroOrOneArgument as ERROR & Roll back the level of ToArrayCallWithZeroLengthArrayArgument as WARNING until Youtrack fix it * Add a comment for string equals regexp in checkstyle config * Fix code format * Add RedundantTypeArguments as ERROR level inspection * Fix cannot resolve symbol datasource	2018-07-27 16:56:49 -05:00
Jihoon Son	1524af703d	Fix IllegalArgumentException in TaskLockBox.syncFromStorage() (#6050 )	2018-07-27 10:43:32 -07:00
Jonathan Wei	0590293538	Add comment and code tweak to Basic HTTP Authenticator (#6029 )	2018-07-20 20:35:14 -07:00
Jihoon Son	b7d42edb0f	Check the kafka topic when compacring checkpoints from tasks and the one stored in metastore (#6015 )	2018-07-20 11:20:23 -07:00
Jihoon Son	c48aa74a30	Fix NPE while handling CheckpointNotice in KafkaSupervisor (#5996 ) * Fix NPE while handling CheckpointNotice * fix code style * Fix test * fix test * add a log for creating a new taskGroup * fix backward compatibility in KafkaIOConfig	2018-07-13 17:14:57 -07:00
Gian Merlino	04ea3c9f8c	Update license headers. (#5976 ) * Update license headers. For compliance with http://www.apache.org/legal/src-headers.html. * More license adjustments. * Fix mistakenly edited package line.	2018-07-11 09:55:18 -07:00
Gian Merlino	948e73da77	Extend various test timeouts. (#5978 ) False failures on Travis due to spurious timeout (in turn due to noisy neighbors) is a bigger problem than legitimate failures taking too long to time out. So it makes sense to extend timeouts.	2018-07-10 13:02:14 -07:00
Surekha	9bece8ce1e	Prevent KafkaSupervisor NPE in generateSequenceName (#5900 ) (#5902 ) * Prevent KafkaSupervisor NPE in checkPendingCompletionTasks (#5900) * throw IAE in generateSequenceName if groupId not found in taskGroups * add null check in checkPendingCompletionTasks * Add warn log in checkPendingCompletionTasks * Address PR comments Replace warn with error log * Address PR comments * change signature of generateSequenceName to take a TaskGroup object instead of int * Address comments * Remove unnecessary method from KafkaSupervisorTest	2018-07-04 23:45:42 -07:00
Jihoon Son	1ccabab98e	Fix the broken Appenderator contract in KafkaIndexTask (#5905 ) * Fix broken Appenderator contract in KafkaIndexTask * fix build * add publishFuture * reuse sequenceToUse if possible	2018-07-03 13:31:29 -07:00
Jihoon Son	b76a056c14	Fix ConcurrentModificationException in IncrementalPublishingKafkaIndexTaskRunner (#5907 ) * Fix ConcurrentModificationException in IncrementalPublishingKafkaIndexTaskRunner * fix lock and add comments	2018-06-30 17:20:41 -07:00
Surekha	0f429298cf	Fix Kafka Indexing task pause forever if no events in taskDuration (#5656 ) (#5899 ) * Fix Kafka Indexing task pause forever (#5656) * Fix Nullpointer Exception in overlord if taskGroups does not contain the groupId * If the endOffset is same as startOffset, still let the task resume instead of returning endOffsets early which causes the tasks to pause forever and ultimately fail on timeout * Address PR comment *Remove the null check and do not return null from generateSequenceName	2018-06-25 19:29:36 -07:00
Jihoon Son	8c5ded0fad	Splitting KafkaIndexTask for better code maintenance (#5854 ) * Refactoring KafkaIndexTask for better code maintenance * fix bug * fix test * add annotation * fix checkstyle * remove SetEndOffsetsResult	2018-06-22 13:00:03 -07:00
Surekha	8619adb5b9	Improve task retrieval APIs on Overlord (#5801 ) * Add the new tasks api in overlordResource It takes 4 optional query params * state(pending/running/waiting/compelte) * dataSource * interval (applies to completed tasks) * maxCompletedTasks (applies to completed tasks) If all params are null, the api returns all the tasks * Add the state to each task returned by tasks endpoint * divide active tasks into waiting, pending or running * Add more unit tests * Add UNKNOWN state to TaskState * Fix the authorization calls * WIP: PR comments Added new class to capture task info for caching Other refactoring * Refactoring : move TaskStatus class to druid-api so it can be accessed within server And other related classes like TaskState and TaskStatusPlus are in api * Remove unused class and apis accessing it * Add a separate cache for recently completed tasks This is to mainly capture the task type from payload * Ignore a test * Add a RuntimeTaskState to encompass all states a task can be in * Revert "Add a RuntimeTaskState to encompass all states a task can be in" This reverts commit `2a527a0731`. * Fix wrong api call * Fix and unignore tests * Remove waiting,pending state from TaskState * Add RunnerTaskState * Missed the annotation runnerStatusCode * Fix the creationTime * Fix the createdTime and queueInsertionTime for running/active tasks * Clean up tests * Add javadocs * Potentially fix the teamcity build * Address PR comments Get rid of TaskInfoBuilder Make TaskInfoMapper static nested class Other changes fix import in MaterializedViewSupervisor after merge * Address PR comments on * Replace global cache with local map * combine multiple queries into one * Removed unused code * Fix unit tests Fix a bug in securedTaskStatusPlus * Remove getRecentlyFinishedTaskStatuses method Change TaskInfoMapper signature to add generic type * Address PR comments * Passed datasource as argument to be used in sql query * Other minor fixes * Address PR comments Some minor changes, rename method, spacing changes Add early auth check if datasource is not null * Fix test case * Add max limit to getRecentlyFinishedTaskInfo in HeapMemoryTaskStorage * Add TaskLocation to Anytask object * Address PR comments * Fix a bug in test case causing ClassCastException	2018-06-19 11:34:59 -07:00
Dylan Wylie	1f700bb880	Suppress JsonPath exceptions in AvroFlattener (#5793 ) Re: #5791 - Make the AvroFlattenerMake consistent with the JSONFlattenerMaker	2018-06-14 17:38:15 -07:00
Jonathan Wei	dc67b77ec2	Immediately send 401 on basic HTTP authentication failure (#5856 ) * Immediately send 401 on basic HTTP authentication failure * Add unit tests	2018-06-14 10:23:10 -07:00
Gian Merlino	0ae4aba4e2	HdfsDataSegmentPusher: Close tmpIndexFile before copying it. (#5873 ) It seems that copy-before-close works OK on HDFS, but it doesn't work on all filesystems. In particular, we observed this not working properly with Google Cloud Storage. And anyway, it's better hygiene to close files before attempting to copy them somewhere else.	2018-06-12 08:58:48 +01:00
Jonathan Wei	684b5d18c1	Moving averages for ingestion row stats (#5748 ) * Moving averages for ingestion row stats * PR comments * Make RowIngestionMeters extensible * test and checkstyle fixes * More PR comments * Fix metrics * Add some comments * PR comments * Comments	2018-06-05 09:08:57 -07:00
Jihoon Son	67ff7dacbd	Support server-side encryption for s3 (#5740 ) * Support server-side encryption for s3 * fix teamcity * typo * address comments * Refactoring configuration injection * fix doc * fix doc	2018-05-28 20:22:08 -07:00
Atul Mohan	1b9611a60e	Local indexing from RDBMS (#5441 ) * Local indexing from RDBMS * Fix content * Remove pom changes * Remove extraneous space * Add tests and update documentation * Fix comments * Fix docs * Fix build related issue * Handle invalid strings * Make target database independent of metadata storage * Add firehose connector * Fix accessibility * Add docs * Remove unused def * Remove lazy instantiation of jsoniterator * Move unused changes * Move unused changes * Fix build * Make Sqlfirehose method private	2018-05-22 12:33:01 +09:00
Alexander Saydakov	15864434be	ArrayOfDoublesSketch module (#5148 ) * ArrayOfDoublesSketch module * UTF-8 fix * javadoc, style fixes * more style fixes * null key selector fix * more style fixes * removed @Override, strict compiler doesn't like it * removed @Override, strict compiler doesn't like it * IndexedInts is not autoclosable? removed one more @0verride * synchronized with upstream master * removed unused imports * addressed review points * null fix * addressed review points * IAE from druid package * synchronized aggregate() and get() * use locks per buffer position * corrected javadoc * style fixes * added lock and narrowed the scope * addressed review comments * conflict resolution went wrong * addressed review comments * javadoc * javadoc links * fully qualified name since there is no import for this class * addressed review points * style fix * StandardCharsets.UTF_8 * addressed review points * added @Override * added equals and hashCode tests for post aggs * formatting * suppress warnings * optimal IndexedInts iteration * suppress SelfEquals * added comments about getClass() in equals()	2018-05-13 15:48:00 +03:00
Jonathan Wei	7a1faa332f	Fix KerberosAuthenticator serverPrincipal host replacement (#5766 )	2018-05-10 11:04:49 +05:30
Kirill Kozlov	67d0b0ee42	Add taskType dimension to task metrics (#5664 )	2018-05-07 09:42:26 -07:00
Slim Bouguerra	8aa8d9fa5b	Kerberos Spnego Authentication Router Issue (#5706 ) * Adding decoration method to proxy servlet Change-Id: I872f9282fb60bfa20524271535980a36a87b9621 * moving the proxy request decoration to authenticators Change-Id: I7f94b9ff5ecf08e8abf7169b58bc410f33148448 * added docs Change-Id: I901543e52f0faf4666bfea6256a7c05593b1ae70 * use the authentication result to decorate request Change-Id: I052650de9cd02b4faefdbcdaf2332dd3b2966af5 * adding authenticated by name Change-Id: I074d2933460165feeddb19352eac9bd0f96f42ca * ensure that authenticator is not null Change-Id: Idb58e308f90db88224a06f3759114872165b24f5 * fix types and minor bug Change-Id: I6801d49a05d5d8324406fc0280286954eb66db10 * fix typo Change-Id: I390b12af74f44d760d0812a519125fbf0df4e97b * use actual type names Change-Id: I62c3ee763363781e52809ec912aafd50b8486b8e * set authenitcatedBy to null for AutheticationResults created by Escalator. Change-Id: I4a675c372f59ebd8a8d19c61b85a1e4bf227a8ba	2018-05-05 20:33:51 -07:00
Dylan Wylie	2c5f0038fd	Make lookup offheap buffer configurable (#5696 ) * Make lookup offheap buffer configurable Fixes #3663 * Address comments * Update docs * Update docs	2018-05-04 10:00:55 -07:00
Surekha	13c616ba24	'maxBytesInMemory' tuningConfig introduced for ingestion tasks (#5583 ) * This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks Currently a config called 'maxRowsInMemory' is present which affects how much memory gets used for indexing.If this value is not optimal for your JVM heap size, it could lead to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might be bad for query performance and a higher value will limit number of persists but require more jvm heap space and could lead to OOM. 'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes kept in memory before persisting. * The default value is 1/3(Runtime.maxMemory()) * To maintain the current behaviour set 'maxBytesInMemory' to -1 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them will be respected i.e. the first one to go above threshold will trigger persist * Fix check style and remove a comment * Add overlord unsecured paths to coordinator when using combined service (#5579) * Add overlord unsecured paths to coordinator when using combined service * PR comment * More error reporting and stats for ingestion tasks (#5418) * Add more indexing task status and error reporting * PR comments, add support in AppenderatorDriverRealtimeIndexTask * Use TaskReport instead of metrics/context * Fix tests * Use TaskReport uploads * Refactor fire department metrics retrieval * Refactor input row serde in hadoop task * Refactor hadoop task loader names * Truncate error message in TaskStatus, add errorMsg to task report * PR comments * Allow getDomain to return disjointed intervals (#5570) * Allow getDomain to return disjointed intervals * Indentation issues * Adding feature thetaSketchConstant to do some set operation in PostAgg (#5551) * Adding feature thetaSketchConstant to do some set operation in PostAggregator * Updated review comments for PR #5551 - Adding thetaSketchConstant * Fixed CI build issue * Updated review comments 2 for PR #5551 - Adding thetaSketchConstant * Fix taskDuration docs for KafkaIndexingService (#5572) * With incremental handoff the changed line is no longer true. * Add doc for automatic pendingSegments (#5565) * Add missing doc for automatic pendingSegments * address comments * Fix indexTask to respect forceExtendableShardSpecs (#5509) * Fix indexTask to respect forceExtendableShardSpecs * add comments * Deprecate spark2 profile in pom.xml (#5581) Deprecated due to https://github.com/druid-io/druid/pull/5382 * CompressionUtils: Add support for decompressing xz, bz2, zip. (#5586) Also switch various firehoses to the new method. Fixes #5585. * This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks Currently a config called 'maxRowsInMemory' is present which affects how much memory gets used for indexing.If this value is not optimal for your JVM heap size, it could lead to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might be bad for query performance and a higher value will limit number of persists but require more jvm heap space and could lead to OOM. 'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes kept in memory before persisting. * The default value is 1/3(Runtime.maxMemory()) * To maintain the current behaviour set 'maxBytesInMemory' to -1 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them will be respected i.e. the first one to go above threshold will trigger persist * Address code review comments * Fix the coding style according to druid conventions * Add more javadocs * Rename some variables/methods * Other minor issues * Address more code review comments * Some refactoring to put defaults in IndexTaskUtils * Added check for maxBytesInMemory in AppenderatorImpl * Decrement bytes in abandonSegment * Test unit test for multiple sinks in single appenderator * Fix some merge conflicts after rebase * Fix some style checks * Merge conflicts * Fix failing tests Add back check for 0 maxBytesInMemory in OnHeapIncrementalIndex * Address PR comments * Put defaults for maxRows and maxBytes in TuningConfig * Change/add javadocs * Refactoring and renaming some variables/methods * Fix TeamCity inspection warnings * Added maxBytesInMemory config to HadoopTuningConfig * Updated the docs and examples * Added maxBytesInMemory config in docs * Removed references to maxRowsInMemory under tuningConfig in examples * Set maxBytesInMemory to 0 until used Set the maxBytesInMemory to 0 if user does not set it as part of tuningConfing and set to part of max jvm memory when ingestion task starts * Update toString in KafkaSupervisorTuningConfig * Use correct maxBytesInMemory value in AppenderatorImpl * Update DEFAULT_MAX_BYTES_IN_MEMORY to 1/6 max jvm memory Experimenting with various defaults, 1/3 jvm memory causes OOM * Update docs to correct maxBytesInMemory default value * Minor to rename and add comment * Add more details in docs * Address new PR comments * Address PR comments * Fix spelling typo	2018-05-03 16:25:58 -07:00
Jihoon Son	d4311b4a5a	Support enablePathStyleAccess, disableChunkedEncoding, and forceGlobalBucketAccessEnabled for aws client (#5702 ) * Support enablePathStyleAccess and disableChunkedEncoding for aws client * add an option for forceGlobalBucketAccessEnabled * add missing doc	2018-05-02 10:45:38 -07:00
David Lim	8ec2d2fe18	Use unique segment paths for Kafka indexing (#5692 ) * support unique segment file paths * forbiddenapis * code review changes * code review changes * code review changes * checkstyle fix	2018-04-29 21:59:48 -07:00
Jonathan Wei	513fab77d9	Lazy init of fullyQualifiedStorageDirectory in HDFS pusher (#5684 ) * Lazy init of fullyQualifiedStorageDirectory in HDFS pusher * Comment * Fix test * PR comments	2018-04-28 21:07:39 -07:00
Roman Leventov	9be000758d	Refactor index merging, replace Rowboats with RowIterators and RowPointers (#5335 ) * Refactor index merging, replace Rowboats with RowIterators and RowPointers * Add javadocs * Fix a bug in QueryableIndexIndexableAdapter * Fixes * Remove unused declarations * Remove unused GenericColumn.isNull() method * Fix test * Address comments * Rearrange some code in MergingRowIterator for more clarity * Self-review * Fix style * Improve docs * Fix docs * Rename IndexMergerV9.writeDimValueAndSetupDimConversion to setUpDimConversion() * Update Javadocs * Minor fixes * Doc fixes, more code comments, cleanup of RowCombiningTimeAndDimsIterator * Fix doc link	2018-04-27 17:34:32 -07:00
Roman Leventov	a3a9ada843	Add GenericWhitespace checkstyle check (#5668 )	2018-04-24 01:09:14 +05:30
Charles Allen	8e441cd142	Fix cache bug in stats module (#5650 )	2018-04-17 15:11:03 -07:00
Roman Leventov	124c89e435	Replace EmittedBatchCounter and UpdateCounter with ConcurrentAwaitableCounter (#5592 ) * Replace EmittedBatchCounter and UpdateCounter with (both not safe for concurrent increments/updates) with ConcurrentAwaitableCounter (safe for concurrent increments) * Fixes * Fix EmitterTest * Added Javadoc and make awaitCount() to throw exceptions on wrong count instead of masking errors	2018-04-13 00:07:11 -04:00
Nishant Bangarwa	80fa5094e8	Fix Kerberos Authentication failing requests without cookies and excludedPaths config. (#5596 ) * Fix Kerberos Authentication failing requests without cookies. KerberosAuthenticator was failing `First` request from the clients. After authentication we were setting the cookie properly but not setting the the authenticated flag in the request. This PR fixed that. Additional Fixes - * Removing of Unused SpnegoFilterConfig - replaced by KerberosAuthenticator * Unused internalClientKeytab and principal from KerberosAuthenticator * Fix docs accordingly and add docs for configuring an escalated client. * Fix excluded path config behavior * spelling correction * Revert "spelling correction" This reverts commit `fb754b43d8`. * Revert "Fix excluded path config behavior" This reverts commit `3901047769`.	2018-04-09 20:45:35 -07:00
Gian Merlino	685f4063d4	DoublesSketchModule: Fix serde for DoublesSketchMergeAggregatorFactory. (#5587 ) Fixes #5580.	2018-04-07 06:45:55 -07:00
Gian Merlino	5ab17668c0	CompressionUtils: Add support for decompressing xz, bz2, zip. (#5586 ) Also switch various firehoses to the new method. Fixes #5585.	2018-04-06 08:06:45 -07:00
Senthil Kumar L S	371c672828	Adding feature thetaSketchConstant to do some set operation in PostAgg (#5551 ) * Adding feature thetaSketchConstant to do some set operation in PostAggregator * Updated review comments for PR #5551 - Adding thetaSketchConstant * Fixed CI build issue * Updated review comments 2 for PR #5551 - Adding thetaSketchConstant	2018-04-05 22:56:59 -07:00
Jonathan Wei	969342cd28	More error reporting and stats for ingestion tasks (#5418 ) * Add more indexing task status and error reporting * PR comments, add support in AppenderatorDriverRealtimeIndexTask * Use TaskReport instead of metrics/context * Fix tests * Use TaskReport uploads * Refactor fire department metrics retrieval * Refactor input row serde in hadoop task * Refactor hadoop task loader names * Truncate error message in TaskStatus, add errorMsg to task report * PR comments	2018-04-05 21:38:57 -07:00
Jonathan Wei	723f7ac550	Add support for task reports, upload reports to deep storage (#5524 ) * Add support for task reports, upload reports to deep storage * PR comments * Better name for method * Fix report file upload * Use TaskReportFileWriter * Checkstyle * More PR comments	2018-04-02 12:10:56 -07:00
Kirill Kozlov	8878a7ff94	Replace guava Charsets with native java StandardCharsets (#5545 )	2018-03-28 21:00:08 -07:00
Jihoon Son	1ad898bde2	Use the official aws-sdk instead of jet3t (#5382 ) * Use the official aws-sdk instead of jet3t * fix compile and serde tests * address comments and fix test * add http version string * remove redundant dependencies, fix potential NPE, and fix test * resolve TODOs * fix build * downgrade jackson version to 2.6.7 * fix test * resolve the last TODO * support proxy and endpoint configurations * fix build * remove debugging log * downgrade hadoop version to 2.8.3 * fix tests * remove unused log * fix it test * revert KerberosAuthenticator change * change hadoop-aws scope to provided in hdfs-storage * address comments * address comments	2018-03-21 15:36:54 -07:00
Charles Allen	58f110f7f8	Future-proof some Guava usage (#5414 ) * Future-proof some Guava usage * Use a java-util EmptyIterator instead of Guava's * Change some of the guava future handling to do manual async transforms. Guava changes transform into transformAsync by deprecating transform in ONLY Guava 19. Then its gone in 20 * Use `Collections.emptyIterator()` * Pretty formatting * Make listenable future transforms a thing in default druid * Format fix * Add forbidden guava apis * Make the ListenableFutrues.transformAsync have comments * Undo intellij bad pattern matching in comments * Futrues --> Futures * Add empty iterators forbidding * Fix extra `A` * Correct method signature * Address review comments * Finish Gian review comments * Proper syntax from https://github.com/policeman-tools/forbidden-apis/wiki/SignaturesSyntax	2018-03-20 08:59:33 -07:00
Roman Leventov	693e3575f9	Remove unused code and exception declarations (#5461 ) * Remove unused code and exception declarations * Address comments * Remove redundant Exception declarations * Make FirehoseFactoryV2.connect() to throw IOException again	2018-03-16 22:11:12 +01:00
Gian Merlino	fdd55538e1	SQL: Remove unused escalator, authConfig from various classes. (#5483 ) DruidPlanner.plan is responsible for checking authorization, so these objects weren't needed in as many places as they were injected.	2018-03-14 13:28:51 -07:00
Jihoon Son	9b2a25bd84	Refactor supervisorReport to be type-safe (#5479 ) * refactor supervisorReport * use primitives	2018-03-13 09:28:44 -07:00
Christoph Hösler	34f655599d	Let MySQLConnector accept all UTF charsets and recommend utf8mb4 (#5411 ) * Let MySQLConnector accept all UTF charsets and recommend utf8mb4 * Fix regex and remove newline in log statement	2018-03-13 01:16:10 -07:00
Niraja Mishra	96cebfc222	As part of this feature, implemented a new endpoint to get running tasks by datasources (#5260 ) and added datasource information as part of existing endpoint /druid/indexer/v1/runningTasks. Added junit test cases for the newly implemented API and fixed existing junit test cases. Fixed review comments - added new method getCreatedDateTimeAndDataSource into TaskStorageQueryAdapter class and formatted changed files	2018-03-12 23:48:11 -07:00
bolkedebruin	8f07a39af7	Skip OS cache on Linux when pulling segments (#5421 ) Druid relies on the page cache of Linux in order to have memory segments. However when loading segments from deep storage or rebalancing the page cache can get poisoned by segments that should not be in memory yet. This can significantly slow down Druid in case rebalancing happens as data that might not be queried often is suddenly in the page cache. This PR implements the same logic as is in Apache Cassandra and Apache Bookkeeper. Closes #4746	2018-03-08 07:54:21 -08:00
Slim	593e87637d	Inline some backward incompatible Hadoop 3.0 method (#5396 ) * Inline some backward incompatible hadoop 3.0 method Change-Id: I49aeff5412d5cdea95e30feb031b2c036d036e9a * fix build issue Change-Id: I0a42fdb83ce970d6a2d3d45f150556e45442a0ac	2018-03-07 07:58:18 -08:00
Clint Wylie	f948066710	KafkaIndexTask remove branch with unreachable code (#5434 )	2018-03-02 17:26:12 -08:00
Nishant Bangarwa	e0d456b1ba	Uniformly set Calcite systemProperties for All Unit tests (#5451 ) Fixes test failures reported in - https://github.com/druid-io/druid/issues/4909 Issue is that If some test skips setting up Calcite system properties with proper encoding and loads calcite classes that use that property, All subsequent tests in the same JVM fails. To reproduce the issue - ExpressionsTest and CalciteQueryTest from IDE in this order. A better fix would be to not use System Properties in calcite, This will work for now. All new Calcite Unit tests that are added need to inherit CalciteTestBase.	2018-03-01 12:56:32 -08:00
Jihoon Son	16e08c9adb	add task priority for kafka indexing (#5444 )	2018-02-28 22:29:23 -08:00
Nishant Bangarwa	219e77aeac	SQL compatible Null Handling Part - Expressions and Storage Changes (#5278 ) * SQL compatible Null Handling Part - Expressions, Storage and Dimension Selector Changes fix travis strict compilation * fix teamcity error - remove unused method * review comments * review comments * more comments * review comments * review comments * Optimize isNull method * Optimize isNull in ColumnarFloats/Longs/Doubles * review comment - separate classes for null and non-null columns fix intellij inspection * remove unused import * More Review comments * improve comment * More review comments * fix checkstyle * more review comments * review comments. fix javadoc links remove Nullable from ConstantColumnValueSelector * review comments. * satisfy teamcity inspections	2018-02-21 13:27:26 +01:00
Parag Jain	fba13d8978	time based checkpointing for Kafka Indexing Service (#5255 ) * time based checkpointing * add test and fix issue * fix comments * fix formatting * update docs	2018-02-15 20:57:02 -08:00
Jihoon Son	cd929000ca	Change early publishing to early pushing in indexTask & refactor AppenderatorDriver (#5297 ) * Fix early publishing to early pushing in batch indexing & refactor appenderatorDriver * fix compile * rename and add more javadocs * Fix conflicts * address comments * revert await executors * fix test	2018-02-14 12:48:33 -08:00
Jonathan Wei	b234a119ac	Log exceptions thrown before persist() for indexing tasks (#5374 ) * Log exceptions thrown before persist() for indexing tasks * PR comment	2018-02-13 09:20:07 -08:00
Roman Leventov	e64ffb10c2	Standartize on using Integer.BYTES instead of Ints.BYTES from Guava, same for other primitives (#5366 )	2018-02-07 13:24:30 -08:00
Gian Merlino	eb17fba0e2	Fix race in CoordinatorPollingBasicAuthorizerCacheManager. (#5359 ) Similar to #5344 but for the authorizer instead of the authenticator.	2018-02-06 16:45:29 -08:00
Gian Merlino	8c738c7076	Fix races in LookupSnapshotTaker, CoordinatorPollingBasicAuthenticatorCacheManager (#5344 ) * Fix races in LookupSnapshotTaker, CoordinatorPollingBasicAuthenticatorCacheManager. Both were susceptible to the following conditions: 1. Two JVMs on the same machine (perhaps two peons) could conflict by one reading while the other was writing, or by writing to the file at the same time. 2. One JVM could partially write a file, then crash, leaving a truncated file. * Use StringUtils.format	2018-02-06 09:44:06 -08:00
Gian Merlino	9a62b02cb7	Extensions: Option to load classes from extension jars first. (#5321 ) The behavior is configurable through druid.extensions.useExtensionClassloaderFirst. It is useful when extensions want to load a dependency different from one provided by Druid, for example a different version of geoip or protobuf.	2018-02-06 16:14:03 +05:30
Gian Merlino	de7f28e6d9	Fix some unemitted alerts in druid-basic-security. (#5327 )	2018-02-02 11:39:21 -08:00
Jonathan Wei	c9e7c0a817	Remove Escalator jetty http client escalation method (#5322 )	2018-02-02 12:43:02 -06:00
Gian Merlino	7e02408510	Update versions to 0.13.0-SNAPSHOT. (#5323 )	2018-02-02 12:06:38 -06:00
Gian Merlino	64ee65856e	ApproximateHistogram: Skip nulls on input, and use more standard parsing code. (#5308 )	2018-01-30 12:32:56 -08:00
Himanshu	632e44c539	use reflection to call hadoop fs.rename to workaround different hadoop jar version in main and hdfs-storage extension class loader (#5296 ) * use reflection to call hadoop fs.rename to workaround different hadoop jar version in main and hdfs-storage extension class loader * find rename method recursively	2018-01-29 10:26:53 -08:00
Jonathan Wei	f6749f1229	Allow separate truststore conf for HttpEmitter (#5298 ) * Fix HttpEmitter TLS support, allow separate truststore conf * PR comment, fix tests	2018-01-26 10:46:06 -06:00
Jonathan Wei	80419752b5	Add metamx emitter, http clients, and metrics packages to druid java-util (#5289 ) * Add metamx java-util emitter, http clients, and metrics packages to druid java-util * Remove metamx java-util from pom.xml files * Checkstyle fixes * Import fix * TeamCity inspection fixes * Use slf4j, move some version defs to master pom.xml * Use parent jvm-attach-api and maven-surefire-plugin versions * Add ] to log msg, suppress inspection	2018-01-24 22:10:36 +01:00
Roman Leventov	61e6878afd	Check Javadoc reference integrity (#5279 )	2018-01-22 13:51:28 -08:00
Roman Leventov	a346bbc6f3	Enforce spacing around foreach colon with Checkstyle (#5271 )	2018-01-22 11:48:51 -08:00
Jihoon Son	241efafbb2	Automatic compaction by coordinators (#5102 ) * Automatic compaction by coordinator * add links * skip compaction for very recent segments if they are small * fix finding search interval * fix finding search interval * fix TimelineHolder iteration * add test for newestSegmentFirstPolicy * add CompactionSegmentIterator * add numTargetCompactionSegments * add missing config * fix skipping huge shards * fix handling large number of segments per shard * fix test failure * change recursive call to loop * fix logging * fix build * fix test failure * address comments * change dataSources type * check running pendingTasks at each run * fix test * address comments * fix build * fix test * address comments * address comments * add doc for segment size optimization * address comment	2018-01-13 13:52:37 +09:00
Gian Merlino	a11049c82f	Fix APPROX_QUANTILE on outer groupBys. (#5253 )	2018-01-12 12:01:32 -08:00
Roman Leventov	8877ce38d6	Enforce modifier order with Checkstyle (#5246 )	2018-01-11 09:50:42 +01:00
Atul Mohan	3cc4a0ab19	Support for encryption of MySQL connections (#5122 ) * Encrypting MySQL connections * Update docs * Make verifyServerCertificate a configurable parameter * Change password parameter and doc update * Make server certificate verification disabled by default * Update tostring * Update docs * Add check for trust store passwords * Add warning for null password	2018-01-10 11:33:54 -08:00
Jihoon Son	5d0619f5ce	Support retrying for PrefetchableTextFilesFirehoseFactory when prefetch is disabled (#5162 ) * Add RetryingInputStream * unnecessary exception * fix PrefetchableTextFilesFirehoseFactoryTest * Fix retrying on connection reset * fix start offset * fix checkstyle * fix check connection reset * address comments * fix compile * address comments * address comments	2018-01-10 17:37:19 +01:00
Parag Jain	83c6c48bed	Fix state check bug in Kafka Index Task (#5204 ) * fix state check for replacement task * fix comments * rebase with master	2018-01-08 18:01:36 -08:00
Jonathan Wei	cdd374a417	Throw away rows with timestamps beyond long bounds in kafka indexing (#5215 ) * Throw away rows with timestamps beyond long bounds in kafka indexing * PR comments	2018-01-08 17:40:50 -06:00
Roman Leventov	579f9fbedf	Add IndexedInts.debugToString() and AbstractIndex.toString(); Add Sequence.toList() and limit() (#5175 ) * Add IndexedInts.debugToString() and AbstractIndex.toString() * Fix AppenderatorTest	2018-01-04 09:56:47 +09:00
David Lim	a7967ade4d	Support replaceExisting parameter for segments pushers (#5187 ) * support replaceExisting parameter for segments pushers * code review changes * code review changes	2018-01-03 16:13:21 -08:00
Slim	c3f7da2128	Remove extra logging by making it debug level (#5193 ) Change-Id: Iaa255862389bdff7fa42b2c08c1e078448b5ee6c	2017-12-23 00:01:10 +03:00
Jihoon Son	9199d61389	Automatic pendingSegments cleanup (#5149 ) * PendingSegments cleanup * fix build * address comments * address comments * fix potential npe * address comments * fix build * fix test * fix test	2017-12-20 14:46:34 -08:00
Parag Jain	c56a9807d4	prevent npe on mismatch between number of kafka partitions and task count (#5139 )	2017-12-20 16:23:15 -06:00
Roman Leventov	f18eba50ee	Remove Aggregator.reset() (#5177 )	2017-12-19 14:09:17 -08:00
Atul Mohan	0eecf2a805	Bump version of druid-basic-security (#5166 )	2017-12-15 11:12:00 -08:00
Roman Leventov	5787d04fad	Bump Druid version to 0.12.0 (#5138 )	2017-12-15 07:37:01 -08:00
Jonathan Wei	f48c9d7be1	Basic auth extension (#5099 ) * Basic auth extension * Add auth configuration integration test * Fix missing authorizerName property * PR comments * Fix missing @JsonProperty annotation * PR comments * more PR comments	2017-12-14 10:36:04 -08:00
Parag Jain	677e24b760	prevent NPE from supressing actual exception (#5146 )	2017-12-12 11:42:30 -08:00
Roman Leventov	64848c7ebf	DataSegment memory optimizations (#5094 ) * Deduplicate DataSegments contents (loadSpec's keys, dimensions and metrics lists as a whole) more aggressively; use ArrayMap instead of default LinkedHashMap for DataSegment.loadSpec, because they have only 3 entries on average; prune DataSegment.loadSpec on brokers * Fix DataSegmentTest * Refinements * Try to fix * Fix the second DataSegmentTest * Nullability * Fix tests * Fix tests, unify to use TestHelper.getJsonMapper() * Revert TestUtil as ServerTestHelper, fix tests * Add newline * Fix indexing tests * Fix s3 tests * Try to fix tests, remove lazy caching of ObjectMapper in TestHelper, rename TestHelper.getJsonMapper() to makeJsonMapper() * Fix HDFS tests * Fix HdfsDataSegmentPusherTest * Capitalize constant names	2017-12-12 11:41:40 -08:00
Jihoon Son	80f5e89a11	Fix DoublesSketchComplexMetricSerde.getSerializer() (#5140 )	2017-12-06 15:10:19 +09:00
Alexander Saydakov	45f91a241e	numeric quantiles sketch aggregator (#5002 ) * numeric quantiles sketch aggregator * it seems that we need to synchronize all methods, which modify the state * Seems like a false positive with -Pstrict * code style fix * code style fix * use sketches-core-0.10.3 * moved cache ids to the central place * better class names * support large columns * explained autodetection, added exception * added comments regarding sketches moving on heap * support reindexing * implemented suggestions from jihoonson * style fix * use max(k, other.k) for better accuracy * check for NilColumnValueSelector instead of null * throw exceptions instead of providing no-op comparators	2017-12-06 08:18:08 +09:00
Roman Leventov	a7a6a0487e	Replace IOPeon with SegmentWriteOutMedium; Improve buffer compression (#4762 ) * Replace IOPeon with OutputMedium; Improve compression * Fix test * Cleanup CompressionStrategy * Javadocs * Add OutputBytesTest * Address comments * Random access in OutputBytes and GenericIndexedWriter * Fix bugs * Fixes * Test OutputBytes.readFully() * Address comments * Rename OutputMedium to SegmentWriteOutMedium and OutputBytes to WriteOutBytes * Add comments to ByteBufferInputStream * Remove unused declarations	2017-12-04 18:04:27 -08:00
Parag Jain	7c01f77b04	Parse Batch support (#5081 ) * add parseBatch and deprecate parse method in InputRowParser add addAll method, skip max rows in memory check for it remove parse method from implemetations transform transformers add string multiplier input row parser fix withParseSpec fix kafka batch indexing fix isPersistRequired comments * add unit test * make persist async * review comments	2017-12-04 16:06:16 -06:00
Fokko Driesprong	2487152b59	Update Avro to 1.8.2 (#5075 ) And add exclusions that are required to have a single version of Apache Avro on the classpath.	2017-11-20 20:29:17 -08:00
Slim	e115da39df	Add relogin logic to renew the Kerberos TGT once it expire (#5096 ) * Kerberos TGT will expire after some pre-determined time, this patch add relogin calls Change-Id: I17ccb9b42aa3032de5d28c8c21e4ffbe8222b815 * exit if the first login passed Change-Id: Ifefd5e9e0dd7d07b05cc493ab1f72415de557ec2	2017-11-20 17:33:39 +05:30
Parag Jain	cb03efeb14	Kafka Index Task that supports Incremental handoffs (#4815 ) * Kafka Index Task that supports Incremental handoffs - Incrementally handoff segments when they hit maxRowsPerSegment limit - Decouple segment partitioning from Kafka partitioning, all records from consumed partitions go to a single druid segment - Support for restoring task on middle manager restarts by check pointing end offsets for segments * take care of review comments * make getCurrentOffsets call async, keep track of publishing sequence, review comments * fix setEndoffset duplicate request handling, formatting * fix unit test * backward compatibility * make AppenderatorDriverMetadata backwards compatible * add unit test * fix deadlock between persist and push executors in AppenderatorImpl * fix formatting * use persist dir instead of work dir * review comments * fix deadlock * actually fix deadlock	2017-11-17 16:05:20 -06:00
Jonathan Wei	9ac150c23a	Split internal client escalation from Authenticator interface (#5073 ) * Split internal client escalation from Authenticator interface * PR comments	2017-11-13 19:29:08 -08:00
Roman Leventov	3541b7544b	Prohibit and remove unused declarations in the processing module (#4930 ) * Prohibit and remove unused declarations in the processing module * Fix tests * Fix integration tests * Suppress unused * Try to remove SuppressWarnings unused in VirtualColumn * Remove reset 'false positives' * Annotate CliCommandCreator as ExtensionPoint * Unused import warning instead of error in IntelliJ * Fixes * Add comment * Fix AzureBlob * Fix CloudFilesBlob * Address comments * Add Project SDK section to INTELLIJ_SETUP.md * Fix image	2017-11-09 09:27:27 -08:00
Jihoon Son	5f3c863d5e	Add compaction task (#4985 ) * Add compaction task * added doc * use combining aggregators * address comments * add support for dimensionsSpec * fix getUniqueDims and getUniqueMetics * find unique dimensionsSpec * fix compilation * add unit test * fix test * fix test * test for different dimension orderings and types, and doc for type and ordering * add control for custom ordering and type * update doc * fix compile * fix compile * add segments param * fix serde error * fix build	2017-11-03 21:55:27 -06:00
Roman Leventov	5eb08c27cb	Add Emitter monitoring (#4973 ) * Add Emitter monitoring * Fix typo * Fixes * testing new emitter * Fix failed test (#71) * testing new emitter * fix on failed test * Remove emitter's readTimeout from docs * Update docs * Add HttpEmittingMonitor * Update java-util to 1.3.2	2017-11-03 21:27:57 -06:00
Gian Merlino	5da0241ac8	Kafka: Fixes needlessly low interpretation of maxRowsInMemory. (#5034 ) AppenderatorImpl already applies maxRowsInMemory across all sinks. So dividing by the number of Kafka partitions is pointless and effectively makes the interpretation of maxRowsInMemory lower than expected. This undoes one of the two changes from #3284, which fixed the original bug twice. In this, that's worse than fixing it once.	2017-11-02 13:45:04 -06:00
Fokko Driesprong	21e1bf68f6	Update Avro to 1.8.0 (#5015 ) The druid parquet extensions uses Avro 1.8 and therefore it is required to update the Avro version itself also to 1.8 to avoid classpath conflicts	2017-11-02 09:08:41 -06:00
Gian Merlino	6c725a7e06	Fix havingSpec on complex aggregators. (#5024 ) * Fix havingSpec on complex aggregators. - Uses the technique from #4883 on DimFilterHavingSpec too. - Also uses Transformers from #4890, necessitating a move of that and other related classes from druid-server to druid-processing. They probably make more sense there anyway. - Adds a SQL query test. Fixes #4957. * Remove unused import.	2017-11-01 12:58:08 -04:00
Gian Merlino	0ce406bdf1	Introduce "transformSpec" at ingest-time. (#4890 ) * Introduce "transformSpec" at ingest-time. It accepts a "filter" (standard query filter object) and "transforms" (a list of objects with "name" and "expression"). These can be used to do filtering and single-row transforms without need for a separate data processing job. The "expression" fields use the same expression language as other expression-based feature. * Remove forbidden api. * Fix compile error. * Fix tests. * Some more changes. - Add nullable annotation to Firehose.nextRow. - Add tests for index task, realtime task, kafka task, hadoop mapper, and ingestSegment firehose. * Fix bad merge. * Adjust imports. * Adjust whitespace. * Make Transform into an interface. * Add missing annotation. * Switch logger. * Switch logger. * Adjust test. * Adjustment to handling for DatasourceIngestionSpec. * Fix test. * CR comments. * Remove unused method. * Add javadocs. * More javadocs, and always decorate. * Fix bug in TransformingStringInputRowParser. * Fix bad merge. * Fix ISFF tests. * Fix DORC test.	2017-10-30 17:38:52 -07:00
elloooooo	52a162e302	define earlyMessegeRejectPeriod as the period after the taskduration (#4990 )	2017-10-27 01:13:46 +05:30
Roman Leventov	dc7cb117a1	Refactor ColumnSelectorFactory; Rely on ColumnValueSelector's polymorphism (#4886 ) * Refactor ColumnSelectorFactory; Rely on ColumnValueSelector's polymorphism * Fix MapVirtualColumn.makeColumnValueSelector() * Minor fixes * Fix IndexGeneratorCombinerTest * DimensionSelector to return zeros when treated as numeric ColumnValueSelector * Fix IncrementalIndexTest * Fix IncrementalIndex.makeColumnSelectorFactory() * Optimize MapBasedRow.getMetric() * Fix VarianceAggregatorTest * Simplify IncrementalIndex.makeColumnSelectorFactory() * Address comments * More comments * Test	2017-10-13 21:44:17 -05:00
Jihoon Son	8d9902831e	Refactoring PrefetchableTextFilesFirehoseFactory (#4836 ) * Refactoring prefetchable firehose * Fix to read cache when prefetch is disabled * More tests * Cleanup codes * Add Fetcher * Fix test failure * Count file size * Fix test * rename generic parameter * address comments * address comments * reuse buffer * move Execs to java-util * use execs * Fix build	2017-10-13 21:39:28 -05:00
Jihoon Son	675c6c00dd	Add checkstyle and intellij rule to prohibit unnecessary qualifiers in interfaces (#4958 ) * add checkstyle and intellij rule * fix tc fail	2017-10-13 07:56:19 -07:00
Atul Mohan	c07678b143	Synchronization of lookups during startup of druid processes (#4758 ) * Changes for lookup synchronization * Refactor of Lookup classes * Minor refactors and doc update * Change coordinator instance to be retrieved by DruidLeaderClient * Wait before thread shutdown * Make disablelookups flag true by default * Update docs * Rename flag * Move executorservice shutdown to finally block * Update LookupConfig * Refactoring and doc changes * Remove lookup config constructor * Revert Lookupconfig constructor changes * Add tests to LookupConfig * Make executorservice local * Update LRM * Move ListeningScheduledExecutorService to ExecutorCompletionService * Move exception to outer block * Remove check to see future is done * Remove unnecessary assignment * Add logging	2017-10-12 21:22:24 -05:00
Jihoon Son	d95915f8d2	Implement get methods for PrefetchableFirehose (#4948 )	2017-10-12 16:14:45 +09:00
Jihoon Son	dfa9cdc982	Prioritized locking (#4550 ) * Implementation of prioritized locking * Fix build failure * Fix tc fail * Fix typos * Fix IndexTaskTest * Addressed comments * Fix test * Fix spacing * Fix build error * Fix build error * Add lock status * Cleanup suspicious method * Add nullables * add doInCriticalSection to TaskLockBox and revert return type of task actions * fix build * refactor CriticalAction * make replaceLock transactional * fix formatting * fix javadoc * fix build	2017-10-11 23:16:31 -07:00
Jihoon Son	56fb11ce0b	Lazy initialization for JavaScript functions (#4871 ) * Lazy initialization of JavaScript functions * Fix test failure * Fix thread-safety and postpone js conf check * Fix test fail * Fix test * Fix KafkaIndexTaskTest * Move config check	2017-10-10 21:52:42 -07:00
Gian Merlino	b20e3038b6	SQL: Upgrade to Calcite 1.14.0, some refactoring of internals. (#4889 ) * SQL: Upgrade to Calcite 1.14.0, some refactoring of internals. This brings benefits: - Ability to do GROUP BY and ORDER BY with ordinals. - Ability to support IN filters beyond 19 elements (fixes #4203). Some refactoring of druid-sql internals: - Builtin aggregators and operators are implemented as SqlAggregators and SqlOperatorConversions rather being special cases. This simplifies the Expressions and GroupByRules code, which were becoming complex. - SqlAggregator implementations are no longer responsible for filtering. Added new functions: - Expressions: strpos. - SQL: TRUNCATE, TRUNC, LENGTH, CHAR_LENGTH, STRLEN, STRPOS, SUBSTR, and DATE_TRUNC. * Add missing @Override annotation. * Adjustments for forbidden APIs. * Adjustments for forbidden APIs. * Disable GROUP BY alias. * Doc reword.	2017-10-10 12:44:05 -07:00
chunghochen	0614b92df1	adding new post aggregators for test statistics to druid-stats extension (#4532 ) * adding new post aggregators of test stats to druid-stats extension * changes to address code review comments * fix checkstyle violations using druid_intellij_formatting.xml after merge upstream/master * add @Override annotation per CI log * make changes per review comments/discussions * remove some blocks per review comments	2017-10-09 23:43:27 -07:00
Guillaume Balaine	35944d24ae	Fix JdbcCacheGenerator, null values shouldn't be allowed (#4881 ) * Fix JdbcCacheGenerator, null values shouldn't be allowed * Add a test case for null values	2017-10-06 09:31:48 -07:00
Alexander Saydakov	bba96f59f8	added missing synchronized keyword (#4894 ) * added missing synchronized keyword * added missing synchronized keyword	2017-10-03 12:16:54 -05:00
Jonathan Wei	5e60ccade1	Add context map to AuthenticationResult (#4870 )	2017-10-02 17:08:14 -05:00
Gian Merlino	1f2074c247	Bump versions in master to 0.11.1-SNAPSHOT. (#4878 ) * Bump versions in master to 0.11.1-SNAPSHOT. * Missed a few.	2017-09-28 17:09:51 -05:00
Himanshu	f69c9280c4	remove ServerConfig from DruidNode as all information needs to be present in DruidNode serialized form (#4858 ) * remove ServerConfig from DruidNode as all information needs to be present in DruidNode serialized form * sanitize output of /druid/coordinator/v1/cluster endpoint	2017-09-28 10:40:59 -05:00
Goh Wei Xiang	2c30d5ba55	Add org.joda.time.DateTime.parse() to forbidden APIs (#4857 ) * Added org.joda.time.DateTime#(java.lang.String) to forbidden API. * Added org.joda.time.DateTime#(java.lang.String, org.joda.time.format.DateTimeFormatter) to forbidden API. * Add additional APIs that may create DateTime with default time zone * Add helper function that accepts formatter to parse String. * Add additional forbidden APIs * Replace existing usage of forbidden APIs * Use wrapper class to enforce Chronology on DateTimeFormatter. * Creates constant UtcFormatter for constant ISODateTimeFormat.	2017-09-27 17:46:44 -05:00
Alexander Saydakov	c3fbe5158d	use latest sketches-core-0.10.1 and memory-0.10.3 (#4828 ) * use latest sketches-core-0.10.1 and memory-0.10.3 * style fix * better variable name * removed explicit dependency on memory	2017-09-27 15:18:33 -05:00
Roman Leventov	c702ac771f	Fix formatting in ApproximateHistogramTest (#4853 )	2017-09-26 15:14:25 -05:00
Gino Ledesma	e60bc0cabc	bug: getQuantiles() returns values that exceed max (#4744 ) Fixes https://github.com/druid-io/druid/issues/3972	2017-09-26 10:43:56 -07:00
Gian Merlino	bf8fd4c203	Add flattenSpec support to the Avro parser. (#4832 ) * Add flattenSpec support to the Avro parser. Also: - Refactor the JSONPathParser a bit so it can share flattening code with Avro (see ObjectFlatteners). - Remove the JSONParser. It was only used in two places: by UriNamespaceExtractor, and as a base for JSONToLowerParser. Migrated the former to JSONPathParser and made the latter a standalone. - Move GenericRecordAsMap to the Parquet extension, since the Avro extension no longer uses it. * Fix indentation. * Fix equals/hashCode.	2017-09-26 09:26:06 -07:00
Roman Leventov	b56a907145	Add namespace extraction thread config (#4833 )	2017-09-25 09:52:36 -07:00
Parag Jain	07446ef32c	warn if topic not found (#4834 )	2017-09-25 12:21:46 +09:00
Charles Allen	a6470c1d03	Move caffeine out of extension and make it the default cache implementation. (#4810 ) * Move caffeine out of extension. * Remove `JsonTypeName` from the class itself * Fix bad docs * Fix distribution pom * Fix unused import * Make caffeine default * Address code comments * Add more description around the jre version in the readme * Add suggested comments	2017-09-22 10:46:55 -07:00
Roman Leventov	e267f3901b	Enforce Indentation with Checkstyle (#4799 )	2017-09-21 13:06:48 -07:00
Roman Leventov	88e9a80636	Rename ObjectValueSelector.get() to getObject(); Add getObject() and classOfObject() to ColumnValueSelector (#4801 )	2017-09-19 14:47:20 -05:00
Charles Allen	e38705e348	Add timing to log for URI based Lookup fetching (#4805 ) * Add timing to log for URI based metrics * Reformat	2017-09-18 11:18:32 -05:00
Gian Merlino	96612cc665	Fix incorrect log formatting in DruidKerberosAuthenticationHandler. (#4817 )	2017-09-17 22:41:36 -07:00
Jonathan Wei	c2a0e753b6	Extension points for authentication/authorization (#4271 ) * Extension points for authentication/authorization * Address some PR comments * Authorization result caching * Add unit tests for SecuritySanityCheckFilter and PreResponseAuthorizationCheckFilter * Use Set for auth caching, close outputstreams in filters * Don't close output stream on success in sanity check filter * Add ConfigResourceFilter to coordinator lookups * Fix filtering authorization check for empty resource list * HttpClient users must explicitly escalate the client * Remove response modification from PreResponseAuthorizationCheckFilter * Remove extraneous pom.xml * Fix unit test * Better lifecycle management * Rename AuthorizationManager to Authorizer * Fix authorization denials for empty supervisor list * Address some PR comments * Address more PR comments * Small cleanup * Add Jetty HttpClient wrapper to Authenticator * Remove Authorizer start/stop * Restore immutable context map in DruidConnection, UT fix * Fix/update docs * Add authorization checks to EventReceiverFirehose * Fix router authorization check failure, restore PreResponseAuthorizationFilter changes * Compile fixes * Test fixes * Update Authenticator/Authorizer doc comments * Merge fixes * PR comments * Fix test * Fix IT * More PR comments * PR comments * SSL fix	2017-09-15 23:45:48 -07:00
Roman Leventov	3f92184dd8	Inspection fixes (#4809 )	2017-09-15 17:48:29 -07:00
Roman Leventov	cd5de123bd	Self-checking S3DataSegmentMover.safeMove() (#4725 ) * Self-checking S3DataSegmentMover.safeMove() * Remove unused in S3DataSegmentMoverTest * Address comments * More specific excpetions * Remove delete check	2017-09-14 13:49:21 -07:00
Jonathan Wei	3a29521273	Fix GroupBy limit push down error when buffer is too small (#4745 ) * Fix GroupBy limit push down error when buffer is too small * Address PR comments	2017-09-12 12:34:50 -07:00
Gian Merlino	34a03b8e6c	SQL: EXPLAIN improvements. (#4733 ) * SQL: EXPLAIN improvements. - Include query JSON in explain output. - Fix a bug where semi-joins and nested groupBys were not fully explained. - Fix a bug where limits were not included in "select" query explanations. * Fix compile error. * Fix compile error. * Fix tests.	2017-09-01 09:35:13 -07:00
Himanshu	4c04083926	kafkaIndexTask unannounce service in final block (#4736 )	2017-09-01 09:31:15 -07:00
Charles Allen	bdfc6fe25e	Move common TypeReference into JacksonUtils (#4738 )	2017-08-31 13:40:16 -07:00
Parag Jain	594a66f3c0	add scheme to AsyncQueryForwardingServlet (#4688 ) * add scheme to AsyncQueryForwardingServlet * add sslContext binding for Router	2017-08-28 15:03:43 -07:00
hzy001	4f61dc66a9	Remove the deprecated variable localChildren (#4357 ) Signed-off-by: Hao Ziyu <haoziyu@qiyi.com>	2017-08-24 15:27:34 -05:00
Roman Leventov	cacf63b007	Add AggregateCombiners (#4676 ) * Add MetricCombiners * Rename MetricCombiner to AggregateCombiner * Spelling * Fix TimestampAggregatorFactory.combine() and add makeAggregateCombiner() implementation * Rename AggregateCombiner.combine() to fold()	2017-08-21 16:45:29 -07:00
Roman Leventov	cbd1902db8	Add forbidden-apis plugin; prohibit using system time zone (#4611 ) * Forbidden APIs WIP * Remove some tests * Restore io.druid.math.expr.Function * Integration tests fix * Add comments * Fix in SimpleWorkerProvisioningStrategy * Formatting * Replace String.format() with StringUtils.format() in RemoteTaskRunnerTest * Address comments * Fix GroupByMultiSegmentTest	2017-08-21 13:02:42 -07:00
Himanshu	74a64c88ab	internal-discovery: interfaces for announcement/discovery, curator based impls (#4634 ) * internal-discovery: interfaces for announcement/discovery, curator impls * more tests * address some review comments * more fixes * address more review comments * simplify ObjectMapper setup in CuratorDruidNodeAnnouncerAndDiscoveryTest * fix KafkaIndexTaskTest * make lookupTier overridable via RealtimeIndexTask and KafkaIndexTask context * make teamcity build happy	2017-08-16 13:07:16 -07:00
Parag Jain	725a144096	add localhost as advertised hostname (#4689 ) * add localhost as advertised hostname * set advertised.host.name to localhost for test kafka broker	2017-08-14 16:59:26 -07:00
Roman Leventov	bf28d0775b	Remove QueryRunner.run(Query, responseContext) and related legacy methods (#4482 ) * Remove QueryRunner.run(Query, responseContext) and related legacy methods * Remove local var	2017-08-11 09:12:38 +09:00
Yuewen Wang	c821bc9a5a	Implement "earlyMessageRejectionPeriod" config discussed in issue #4599 (#4607 ) * Implement "earlyMessageRejectionPeriod" config discussed in issue #4599 * implement the logics of this param * Added doc of this config * Added unit tests of it * Update KafkaSupervisor.java ameliorate comment * fix format * fix bug when rebasing	2017-08-11 09:12:08 +09:00
Peter Cunningham	ede7cf9eef	Added support for where clauses to JDBC lookups. (#4643 ) * Added support for where clauses to filter lookup values on ingestion. Added a filter field to the JDBC lookups that is used to generate a where clause so that only rows matching the filter value will be brought into Druid. Example being filter="SOMECOLUMN=1" * Required changes based on code review. * Required changes based on code review. * Added support for where clauses to filter lookup values on ingestion. Added a filter field to the JDBC lookups that is used to generate a where clause so that only rows matching the filter value will be brought into Druid. Example being filter="SOMECOLUMN=1" * Updates based on code review, mainly formatting and small refactor of the buildLookupQuery method. * Fixed broken buildLookupQuery method * Removed empty line. * Updates per review comments	2017-08-09 10:47:46 -07:00
Roman Leventov	7454fd86a0	Polymorphic numeric getters for ColumnValueSelector (#4623 ) * Add methods getFloat(), getDouble() and getLong() to ColumnValueSelector * Fix copy-paste mistake in docs * Spelling	2017-08-08 18:38:06 -07:00
Jihoon Son	d5606bc558	Passing lockTimeout as a parameter for TaskLockbox.lock() (#4549 ) * Passing lockTimeout as a parameter for TaskLockbox.lock() * Remove TIME_UNIT * Fix tc fail * Add taskLockTimeout to TaskContext * Add caution	2017-08-08 18:21:07 -07:00
Roman Leventov	f5d4171459	Prohibit for loops which could be foreach with IntelliJ (#4653 ) * Replace for with foreach * Replace for with for-each in GroupByQueryEngineV2 * Remove io.druid.collections.IntList	2017-08-08 18:05:33 -07:00
Charles Allen	bbe7fb8c46	Better logging for S3DataSegmentPuller `getVersion` (#4657 ) * Eventual consistency of S3 means a `404` can be thrown. It helps to know the URI that was attempted.	2017-08-08 16:21:22 +03:00
Roman Leventov	aa7e4ae5e4	Enforce correct spacing with Checkstyle (#4651 )	2017-08-05 10:18:25 -07:00
Jihoon Son	f3f2cd35e1	Array-based aggregation for groupBy query (#4576 ) * Array-based aggregation * Fix handling missing grouping key * Handle invalid offset * Fix compilation * Add cardinality check * Fix cardinality check * Address comments * Address comments * Address comments * Address comments * Cleanup GroupByQueryEngineV2.process * Change to Byte.SIZE * Add flatMap	2017-08-03 20:04:54 +03:00
Charles Allen	8921538251	Make AWSCredentialsConfig use PasswordProvider for the string matter (#4613 ) * Make AWSCredentialsConfig use PasswordProvider for the string matter * Fixes https://github.com/druid-io/druid/issues/3911 * Add unit tests	2017-07-29 15:48:49 -07:00
Roman Leventov	5929066dfb	Add NamespaceLookupExtractorFactory.toString() (#4606 )	2017-07-26 12:02:07 -07:00
Gian Merlino	5048ab3e96	Add metrics to the native queries underpinning SQL. (#4561 ) * Add metrics to the native queries underpinning SQL. This is done by factoring out the metrics and request log emitting code from QueryResource into a new QueryLifecycle class. That class is used by both QueryResource and the SQL DruidSchema and QueryMaker. Also fixes a couple of bugs in QueryResource: - RequestLogLine start time was set to `TimeUnit.NANOSECONDS.toMillis(startNs)`, which is incorrect since absolute nanos cannot be converted to millis. - DruidMetrics.makeRequestMetrics was called with null `query` on unparseable queries, which led to spurious "Unable to log query" errors. Partial fix for #4047. * Code style * Remove unused imports. * Fix tests. * Remove unused import.	2017-07-24 21:26:27 -07:00
Roman Leventov	c0beb78ffd	Enforce brace formatting with Checkstyle (#4564 )	2017-07-21 10:26:59 -05:00
Gian Merlino	2be7068f6e	Fixes and improvements to SQL metadata caching. (#4551 ) * Fixes and improvements to SQL metadata caching. Also adds support for MultipleSpecificSegmentSpec to CachingClusteredClient. SQL changes: - Cache metadata on a per-segment level, in addition to per-dataSource, so we don't need to re-query all segments whenever a single new one appears. This should lower the load placed on the cluster by metadata queries. - Fix race condition in DruidSchema that can cause us to miss metadata. It was possible to notice new segments, then issue a query, and have that query not actually hit those segments, and not notice that it didn't hit those segments. Then, the metadata from those segments would be ignored. - Fix assumption in DruidSchema that all segments are immutable. Now, mutable segments are periodically re-queried. - Fix inappropriate re-use of SchemaPlus. Now we create one for each planning cycle, rather than sharing one. It caches table objects, which we want to avoid, since it can cause stale metadata. We do the caching in DruidSchema so we don't need the SchemaPlus caching. Server changes: - Add a TimelineCallback to TimelineServerView, for callers that want to get updates when the timeline has been modified. - Change CachingClusteredClient from a QueryRunner to a QuerySegmentWalker. This allows it to accept queries that are segment-descriptor-based rather than intervals-based. In particular it will now support MultipleSpecificSegmentSpec. * Fix DruidSchema, and unused imports. * Remove unused import. * Fix SqlBenchmark.	2017-07-20 10:14:15 -07:00
Slim	71e7a4c054	Adding double colums supports (#4491 ) * add double columns support * Fix numbers and expected results in UTs * adding float aggregators * fix IT expected test results * fix comments * more fixes * fix comp * fix test * refactor double and float aggregator factories * fix * fix UTs * fix comments * clean unused code * fix more comments * undo unnecessary changes * fix null issue * refactor TopNColumnSelectorStrategyFactory * fix docs * refactor NumericTopNColumnSelectorStrategy * fix return * fix comments * handle the null case in DimesionIndexer * more null fixing * cosmetic changes	2017-07-20 10:14:14 +03:00
Gian Merlino	441ee56ba9	DataSegmentPusher: Add allowed hadoop property prefixes. (#4562 ) * DataSegmentPusher: Add allowed hadoop property prefixes. * Fix dots.	2017-07-18 10:16:12 -07:00
Roman Leventov	60cdf94677	Add PMD and prohibit unnecessary fully qualified class names in code (#4350 ) * Add PMD and prohibit unnecessary fully qualified class names in code * Extra fixes * Remove extra unnecessary fully-qualified names * Remove qualifiers * Remove qualifier	2017-07-17 22:22:29 +09:00
Chris Gavin	960cb07ea6	Fix some unnecessary use of boxed types and incorrect format strings spotted by lgtm. (#4474 ) * Remove some unnecessary use of boxed types. * Fix some incorrect format strings. * Enable IDEA's MalformedFormatString inspection. * Add a Checkstyle check for finding uses of incorrect logging packages. * Fix some incorrect usages of the metamx logger. * Bypass incorrect logger Checkstyle check where using the correct logger is not simple. * Fix some more places where the wrong number of arguments are provided to format strings. * Suppress `MalformedFormatString` inspection on legacy logging test. * Use @SuppressWarnings rather than a noinspection suppression comment. * Fix some more incorrect format strings. * Suppress some more incorrect format string warnings where the incorrect string is intentional. * Log the aggregator when closing it fails. * Remove some unneeded log lines.	2017-07-13 12:15:32 -07:00
Roman Leventov	b2865b7c7b	Make possible to start Peon without DI loading of any querying-related stuff (#4516 ) * Make QueryRunnerFactoryConglomerate injection lazy in TaskToolbox/TaskToolboxFactory * Extract QueryablePeonModule and add druid.modules.excludeList config * Typo	2017-07-12 13:18:25 -05:00
Akash Dwivedi	a108d05f76	Use GenericIndexed v2 supported read() during deserializeColumn (#4463 )	2017-07-11 10:18:25 -05:00
Akash Dwivedi	5f411f14af	Timeout for LockAcquireAction (#4461 ) * Timeout for LockAcquireAction * Static inner class. * Rebase changes. * makeAlert and throw exception incase of overlapping interval. * Addressed comments. * remove unused import. * Addressed comments	2017-07-11 18:59:32 +09:00
Jihoon Son	cc20260078	Early publishing segments in the middle of data ingestion (#4238 ) * Early publishing segments in the middle of data ingestion * Remove unnecessary logs * Address comments * Refactoring the patch according to #4292 and address comments * Set the total shard number of NumberedShardSpec to 0 * refactoring * Address comments * Fix tests * Address comments * Fix sync problem of committer and retry push only * Fix doc * Fix build failure * Address comments * Fix compilation failure * Fix transient test failure	2017-07-10 22:35:36 -07:00
Jihoon Son	8ed25acc15	Fix a bug for CSVParser/DelimitedParser when empty column exists in the header row (#4504 ) * Fix a bug when empty column exists in header row * Address comments	2017-07-07 16:19:25 -07:00
Gian Merlino	16817e408d	SQL + Expressions = Best friends forever. (#4360 ) * SQL + Expressions = Best friends forever. - Use expressions as a projection layer for anything that can't be expressed using traditional Druid extractionFns. Sometimes they're embedded directly (like "expression" filters, builtin aggregators, or "expression" post-aggregators). Sometimes they're referenced through virtual columns (like dimensionSpecs, which can't innately reference functions of more than one column without the virtual column layer). - Add many new functions and operators, taking advantage of the expression capability (see the querying/sql.md doc). - Improve consistency of constant reduction and of casting by using Druid expressions for this instead of Calcite's RexExecutor. * Fix casting bug, and other code review comments. * Fix docs.	2017-07-07 08:48:26 -07:00
Roman Leventov	d168a4271e	Use Double.NEGATIVE_INFINITY and Double.POSITIVE_INFINITY (#4496 ) * Use Double.NEGATIVE_INFINITY and Double.POSITIVE_INFINITY instead of Double.MIN_VALUE and Double.MAX_VALUE, same for Float * Replace usages in comments * Fix RTree * Remove commented code * Add tests	2017-07-07 09:10:13 -06:00
Parag Jain	6e2f78f552	TLS support (#4270 )	2017-07-06 17:40:12 -07:00
Roman Leventov	9ae457f7ad	Avoid using the default system Locale and printing to System.out in production code (#4409 ) * Avoid usages of Default system Locale and printing to System.out or System.err in production code * Fix Charset in DruidKerberosUtil * Remove redundant string format in GenericIndexed * Rename StringUtils.safeFormat() to unimportantSafeFormat(); add StringUtils.format() which fails as well as String.format() * Fix testSafeFormat() * More fixes of redundant StringUtils.format() inside ISE * Rename unimportantSafeFormat() to nonStrictFormat()	2017-06-29 14:06:19 -07:00
Roman Leventov	ae900a4934	Update versions to 0.11.0-SNAPSHOT (#4483 )	2017-06-28 17:05:58 -07:00
Gian Merlino	4c33d0a00f	Add some new expression functions and macros. (#4442 ) * Add some new expression functions and macros. See misc/math-expr.md for the list of added functions, except for "like", which previously existed but was not documented. * Add easymock to datasketches tests. * Add easymock to distinctcount tests. * Add easymock to virtual-columns tests. * Code review comments. * Clean up code a bit. * Add easymock to scan-query tests. * Rework ExprMacros that have multiple impls. * Improve test coverage.	2017-06-28 10:15:58 -07:00
Jihoon Son	79fd5338e3	Get s3 objects directly from prefixes when listing is failed due to permission (#4444 ) * Fall back to getObject when listing is failed due to permission * Throws an exception when listing is not allowed on directory * Fix error messages	2017-06-27 18:58:37 -07:00
Roman Leventov	05d58689ad	Remove the ability to create segments in v8 format (#4420 ) * Remove ability to create segments in v8 format * Fix IndexGeneratorJobTest * Fix parameterized test name in IndexMergerTest * Remove extra legacy merging stuff * Remove legacy serializer builders * Remove ConciseBitmapIndexMergerTest and RoaringBitmapIndexMergerTest	2017-06-26 13:21:39 -07:00
Jihoon Son	5fec619284	Make KafkaLookupExtractorFactoryTest fast (#4466 ) * Make KafkaLookupExtractorFactoryTest fast * Use list * Use Bytes	2017-06-26 10:15:28 -05:00
Himanshu	61c38b66ad	exclude aws-java-sdk from hadoop-aws dep in hdfs-storage module (#4437 ) * exclude aws-java-sdk from hdfs-storage module * address review comments	2017-06-22 15:56:35 -05:00
Goh Wei Xiang	f68a0693f3	Allow use of non-threadsafe ObjectCachingColumnSelectorFactory (#4397 ) * Adding a flag to indicate when ObjectCachingColumnSelectorFactory need not be threadsafe. * - Use of computeIfAbsent over putIfAbsent - Replace Maps.newXXXMap() with normal instantiation - Documentations on when is thread-safe required. - Use Builders for On/OffheapIncrementalIndex * - Optimization on computeIfAbsent - Constant EMPTY DimensionsSpec - Improvement on IncrementalIndexSchema.Builder - Remove setting of default values - Use var args for metrics - Correction on On/OffheapIncrementalIndex Builders - Combine On/OffheapIncrementalIndex Builders * - Removing unused imports. * - Helper method for testing with IncrementalIndex.Builder * - Correction on javadoc. * Style fix	2017-06-16 16:04:19 -05:00
Gian Merlino	17ef785618	Speed up sketch tests by merging fewer indexes. (#4413 ) The tests go from 5 minutes to about 10 seconds. 1000 maxRowCount is still low enough to get a few merges, so we're still exercising that functionality.	2017-06-15 14:47:55 -05:00
Roman Leventov	976492c186	Make PolyBind to fail if property value is not found (fixes #4369 ) (#4374 ) * Make PolyBind to fail if property value is not found * Fix test * Add onHeap option in NamespaceExtractionModule * Add PolyBind.createChoiceWithDefaultNoScope() * Fix NPE * Fix * Configure MetadataStorageProvider option for MySQL, PostgreSQL and SQLServer * Deprecate PolyBind.createChoiceWithDefault form with unused defaultKey * Fix NPE	2017-06-13 09:45:43 -07:00
Roman Leventov	c121845102	Avoid using Guava in DataSegmentPushers because of incompatibilities (#4391 ) * Avoid using Guava in DataSegmentPushers because of Hadoop incompatibilities * Clarify comments	2017-06-12 09:58:34 -07:00
Roman Leventov	5285eb961b	Update dependencies (#4313 ) * Update dependencies * Downgrade curator * Rollback aws-java-sdk dependency to 1.10.77 * Revert exclusions in integration-tests * Depend only on aws-java-sdk-ec2 instead of umbrella aws-java-sdk (fixes #4382)	2017-06-09 14:32:07 -07:00
Niketh Sabbineni	2cd91b64d0	Uncompress streams without having to download to tmp first (#4364 ) * Uncompress streams without having to download to tmp first * Remove unused file	2017-06-08 18:08:38 -07:00
Gian Merlino	1f2afccdf8	Expressions: Add ExprMacros. (#4365 ) * Expressions: Add ExprMacros, which have the same syntax as functions, but can convert themselves to any kind of Expr at parse-time. ExprMacroTable is an extension point for adding new ExprMacros. Anything that might need to parse expressions needs an ExprMacroTable, which can be injected through Guice. * Address code review comments.	2017-06-08 09:32:10 -04:00
Roman Leventov	63a897c278	Enable most IntelliJ 'Probable bugs' inspections (#4353 ) * Enable most IntelliJ 'Probable bugs' inspections * Fix in RemoteTestNG * Fix IndexSpec's equals() and hashCode() to include longEncoding * Fix inspection errors * Extract global isntance of natural().nullsFirst(); address comments * Fix * Use noinspection comments instead of SuppressWarnings on method for IntelliJ-specific inspections * Prohibit Ordering.natural().nullsFirst() using Checkstyle	2017-06-07 09:54:25 -07:00
Roman Leventov	31d33b333e	Make using implicit system Charset an error (#4326 ) * Make using implicit system charset an error * Use StringUtils.toUtf8() and fromUtf8() instead of String.getBytes() and new String() * Use English locale in StringUtils.safeFormat() * Restore comment	2017-06-05 23:57:25 -07:00
David Lim	13ecf90923	Report Kafka lag information in supervisor status report (#4314 ) * refactor lag reporting and report lag at status endpoint * refactor offset reporting logic to fetch offsets periodically vs. at request time * remove JavaCompatUtils * code review changes * code review changes	2017-06-05 13:26:25 -07:00
Slim	a2584d214a	Delagate creation of segmentPath/LoadSpec to DataSegmentPushers and add S3a support (#4116 ) * Adding s3a schema and s3a implem to hdfs storage module. * use 2.7.3 * use segment pusher to make loadspec * move getStorageDir and makeLoad spec under DataSegmentPusher * fix uts * fix comment part1 * move to hadoop 2.8 * inject deep storage properties * set version to 2.7.3 * fix build issue about static class * fix comments * fix default hadoop default coordinate * fix create filesytem * downgrade aws sdk * bump the version	2017-06-04 00:55:09 -06:00
Roman Leventov	ebabe14fbe	Rename ExtractionNamespaceCacheFactory to CachePopulator (the last part of #3667 ) (#4303 ) * Renamed ExtractionNamespaceCacheFactory to CachePopulator, and related classes * Rename CachePopulator to CacheGenerator	2017-06-03 10:09:44 +09:00
Jihoon Son	da32e1ae53	Reducing testing time for KafkaIndexTaskTest and KafkaSupervisorTest (#4352 )	2017-06-03 00:53:07 +09:00
Jihoon Son	f876246af7	Rename FiniteAppenderatorDriver to AppenderatorDriver (#4356 )	2017-06-03 00:48:44 +09:00
kaijianding	0efd18247b	explicitly unmap hydrant files when abandonSegment to recycle mmap memory (#4341 ) * fix TestKafkaExtractionCluster fail due to port already used * explicitly unmap hydrant files when abandonSegment to recyle mmap memory * address the comments * apply to AppenderatorImpl	2017-06-01 18:15:30 -05:00
Jihoon Son	1150bf7a2c	Refactoring Appenderator Driver (#4292 ) * Refactoring Appenderator 1) Added publishExecutor and handoffExecutor for background publishing and handing segments off 2) Change add() to not move segments out in it * Address comments 1) Remove publishTimeout for KafkaIndexTask 2) Simplifying registerHandoff() 3) Add increamental handoff test * Remove unused variable * Add persist() to Appenderator and more tests for AppenderatorDriver * Remove unused imports * Fix strict build * Address comments	2017-06-02 07:09:11 +09:00
Kenji Noguchi	3400f601db	Protobuf extension (#4039 ) * move ProtoBufInputRowParser from processing module to protobuf extensions * Ported PR #3509 * add DynamicMessage * fix local test stuff that slipped in * add license header * removed redundant type name * removed commented code * fix code style * rename ProtoBuf -> Protobuf * pom.xml: shade protobuf classes, handle .desc resource file as binary file * clean up error messages * pick first message type from descriptor if not specified * fix protoMessageType null check. add test case * move protobuf-extension from contrib to core * document: add new configuration keys, and descriptions * update document. add examples * move protobuf-extension from contrib to core (2nd try) * touch * include protobuf extensions in the distribution * fix whitespace * include protobuf example in the distribution * example: create new pb obj everytime * document: use properly quoted json * fix whitespace * bump parent version to 0.10.1-SNAPSHOT * ignore Override check * touch	2017-05-30 13:11:58 -07:00
Jihoon Son	7889891bd3	Fix integration tests (#4337 ) * Fix integration tests 1) Use the same version of kafka 2) Change ServiceEmitter from LazySingleton to ManageLifecycle * Revert unnecessary change	2017-05-28 08:48:39 -07:00
Gian Merlino	fe42db98ac	URIExtractionNamespace: Avoid problems due to canonicalization of lookup fields. (#4307 ) Disables canonicalization for simpleJson, where expect field names to be unique anyway. Keeps canonicalization enabled for customJson, but avoids sharing the table with the global ObjectMapper.	2017-05-24 17:41:04 -07:00
Jonathan Wei	d49e53e6c2	Timeout and maxScatterGatherBytes handling for queries run by Druid SQL (#4305 ) * Timeout and maxScatterGatherBytes handling for queries run by Druid SQL * Address PR comments * Fix contexts in CalciteQueryTest * Fix contexts in QuantileSqlAggregatorTest	2017-05-23 16:57:51 +09:00
Roman Leventov	7479cbde68	Make CacheScheduler a singleton (#4293 )	2017-05-18 15:46:02 -07:00
Jihoon Son	733dfc9b30	Add PrefetchableTextFilesFirehoseFactory for cloud storage types (#4193 ) * Add PrefetcheableTextFilesFirehoseFactory * fix comment * exception handling * Fix wrong json property * Remove ReplayableFirehoseFactory and fix misspelling * Defer object initialization * Add a temporaryDirectory parameter to FirehoseFactory.connect() * fix when cache and fetch are disabled * Address comments * Add more test * Increase timeout for test * Add wrapObjectStream * Move methods to Firehose from PrefetchableFirehoseFactory * Cleanup comment * add directory listing to s3 firehose * Rename a variable * Addressing comments * Update document * Support disabling prefetch * Fix race condition * Add fetchLock * Remove ReplayableFirehoseFactoryTest * Fix compilation error * Fix test failure * Address comments * Add default implementation for new method	2017-05-18 15:37:18 +09:00
Himanshu	daa8ef8658	Optional long-polling based segment announcement via HTTP instead of Zookeeper (#3902 ) * Optional long-polling based segment announcement via HTTP instead of Zookeeper * address review comments * make endpoint /druid-internal/v1 instead of /druid/internal so that jetty qos filters can be configured easily when needed * update segment callback initialization to be called only after first segment list fetch has been succeeded from all servers * address review comments * remove size check not required anymore as only segment servers announce themselves and not all peon processes * annouce segment server on historical only after cached segments are loaded * fix checkstyle errors	2017-05-17 16:31:58 -05:00
Roman Leventov	d400f23791	Monomorphic processing of TopN queries with simple double aggregators over historical segments (part of #3798 ) (#4079 ) * Monomorphic processing of topN queries with simple double aggregators and historical segments * Add CalledFromHotLoop annocations to specialized methods in SimpleDoubleBufferAggregator * Fix a bug in Historical1SimpleDoubleAggPooledTopNScannerPrototype * Fix a bug in SpecializationService * In SpecializationService, emit maxSpecializations warning only once * Make GenericIndexed.theBuffer final * Address comments * Newline * Reapply `439c906` (Make GenericIndexed.theBuffer final) * Remove extra PooledTopNAlgorithm.capabilities field * Improve CachingIndexed.inspectRuntimeShape() * Fix CompressedVSizeIntsIndexedSupplier.inspectRuntimeShape() * Don't override inspectRuntimeShape() in subclasses of CompressedVSizeIndexedInts * Annotate methods in specializations of DimensionSelector and FloatColumnSelector with @CalledFromHotLoop * Make ValueMatcher to implement HotLoopCallee * Doc fix * Fix inspectRuntimeShape() impl in ExpressionSelectors * INFO logging of specialization events * Remove modificator * Fix OrFilter * Fix AndFilter * Refactor PooledTopNAlgorithm.scanAndAggregate() * Small refactoring * Add 'nothing to inspect' messages in empty HotLoopCallee.inspectRuntimeShape() implementations * Don't care about runtime shape in tests * Fix accessor bugs in Historical1SimpleDoubleAggPooledTopNScannerPrototype and HistoricalSingleValueDimSelector1SimpleDoubleAggPooledTopNScannerPrototype, cover them with tests * Doc wording * Address comments * Remove MagicAccessorBridge and ensure Offset subclasses are public * Attach error message to element	2017-05-16 16:19:55 -07:00
Roman Leventov	b7a52286e8	Make @Override annotation obligatory (#4274 ) * Make MissingOverride an error * Make travis stript to fail fast * Add missing Override annotations * Comment	2017-05-16 13:30:30 -05:00
David Lim	8333043b7b	add skipOffsetGaps flag (#4256 )	2017-05-16 12:19:28 -06:00
Benedict Jin	e823085866	Improve `collection` related things that reusing a immutable object instead of creating a new object (#4135 )	2017-05-17 01:38:51 +09:00
Jihoon Son	50a4ec2b0b	Add support for headers and skipping thereof for CSV and TSV (#4254 ) * initial commit * small fixes * fix bug * fix bug * address code review * more cr * more cr * more cr * fix * Skip head rows for CSV and TSV * Move checking skipHeadRows to FileIteratingFirehose * Remove checking null iterators * Remove unused imports * Address comments * Fix compilation error * Address comments * Add more tests * Add a comment to ReplayableFirehose * Addressing comments * Add docs and fix typos	2017-05-15 22:57:31 -07:00
Fokko Driesprong	5ca67644e7	Remove slf4j as dependencies (#4233 ) From the kafka-schema-registry-client in the avro extension slf4j will be packaged into the distribution. We don't want this as it will conflict and throw a slf4j multiple bindings warning. This will cause slf4j to fall back to no-operation (NOP) binding.	2017-05-12 15:59:14 +09:00
Roman Leventov	1ebfa22955	Update Error prone configuration; Fix bugs (#4252 ) * Make Errorprone the default compiler * Address comments * Make Error Prone's ClassCanBeStatic rule a error * Preconditions allow only %s pattern * Fix DruidCoordinatorBalancerTester * Try to give the compiler more memory * Remove distribution module activation on jdk 1.8 because only jdk 1.8 is used now * Don't show compiler warnings * Try different travis script * Fix travis.yml * Make Error Prone optional again * For error-prone compiler * Increase compiler's maxmem * Don't run Error Prone for benchmarks because of OOM * Skip install step in Travis * Remove MetricHolder.writeToChannel() * In travis.yml, check compilation before tests, because it may fail faster	2017-05-12 15:55:17 +09:00
Roman Leventov	e09e892477	Refactor QueryRunner to accept QueryPlus: Query + QueryMetrics (part of #3798 ) (#4184 ) * Add QueryPlus. Add QueryRunner.run(QueryPlus, Map) method with default implementation, to replace QueryRunner.run(Query, Map). * Fix GroupByMergingQueryRunnerV2 * Fix QueryResourceTest * Expand the comment to Query.run(walker, context) * Remove legacy version of BySegmentSkippingQueryRunner.doRun() * Add LegacyApiQueryRunnerTest and be more specific about legacy API removal plans in Druid 0.11 in Javadocs	2017-05-10 12:25:00 -07:00
Parag Jain	1fd177039d	fix auto reset - pause task instead of putting thread to sleep (#4244 )	2017-05-08 15:08:25 -07:00
Parag Jain	eb8e1b0a97	Prevent interrupted exception from polluting log during supervisor shutdown (#4253 ) * Prevent interrupted exception from polluting log during supervisor shutdown * do nothing in case of InterruptedException	2017-05-08 15:05:25 -07:00
Parag Jain	4502c207af	fix injection bug and documentation (#4243 )	2017-05-03 15:07:43 -05:00
Parag Jain	f9a61ea2ba	Kafka lag emitter - Kafka Indexing Service (#4194 ) * Kafka lag emitter * enforce minimum emit period to a minute * fixed comment	2017-05-02 17:30:07 -06:00
Roman Leventov	0bc18e7906	Make UpdateCounter proof to update count overflow (#4138 ) * Make UpdateCounter proof to update count overflow. * Fix	2017-05-01 09:59:49 -07:00
Bas van Schaik	54463941b9	Fix two alerts from lgtm.com: comparing two boxed primitive values using (#4212 ) the == or != operator compares object identity, which may not be intended Details: `013566ade9/files/extensions-core/datasketches/src/main/java/io/druid/query/aggregation/datasketches/theta/SketchEstimatePostAggregator.java (V144)` `013566ade9/files/extensions-core/datasketches/src/main/java/io/druid/query/aggregation/datasketches/theta/SketchMergeAggregatorFactory.java (V164)`	2017-04-26 14:56:25 -07:00
Akash Dwivedi	a2419654ea	Allow hadoop configurations using runtime properties. (#4189 )	2017-04-26 00:05:27 +05:30
Gian Merlino	3b92220015	Reduce log spam from Avro decoders. (#4205 ) These objects get constructed semi-frequently (any time a parser is deserialized) and so info logs are spammy. They'll still appear in task logs at least once, since they're part of the task definition and will get logged due to that.	2017-04-25 23:59:59 +05:30
Benedict Jin	de815da942	Some code refactor for better performance of `Avro-Extension` (#4092 ) * 1. Collections.singletonList instand of Arrays.asList; 2. close FSDataInputStream/ByteBufferInputStream for releasing resource; 3. convert com.google.common.base.Function into java.util.function.Function; 4. others code refactor * Put each param on its own line for code style * Revert GenericRecordAsMap back about `Function`	2017-04-25 12:46:32 +09:00
satishbhor	d51097c809	Fix lz4 library incompatibility in kafka-indexing-service extension (#4115 ) * Fix lz4 library incompatibility in kafka-indexing-service extension #3266 * Bumped Kafka version to 0.10.2.0 for : Fix lz4 library incompatibility in kafka-indexing-service extension #3266 * Replaced Lists.newArrayList() with Collections.singletonList() For Fix lz4 library incompatibility in kafka-indexing-service extension #4115	2017-04-25 12:23:51 +09:00
Gian Merlino	2ca7b00346	Update versions to 0.10.1-SNAPSHOT. (#4191 )	2017-04-20 18:12:28 -07:00
Jerry Chung	0bcfd9354c	Fix S3 deep storage push and s3 insert-segment-to-db (#4174 ) * Fix S3 deep storage push and s3 insert-segment-to-db * Less verbose checks in S3DataSegmentFinder	2017-04-14 19:42:10 -07:00
Gian Merlino	b2954d5fea	Better groupBy error messages and docs around resource limits. (#4162 ) * Better groupBy error messages and docs around resource limits. * Fix BufferGrouper test from datasketches. * Further clarify.	2017-04-13 10:38:53 -07:00
Roman Leventov	15f3a94474	Copy closer into Druid codebase (fixes #3652 ) (#4153 )	2017-04-10 09:38:45 +09:00
Parag Jain	7e0d4c9555	secure supervisor endpoints (#3985 )	2017-04-05 16:42:32 -07:00
Roman Leventov	73d9b31664	GenericIndexed minor bug fixes, optimizations and refactoring (#3951 ) * Minor bug fixes in GenericIndexed; Refactor and optimize GenericIndexed; Remove some unnecessary ByteBuffer duplications in some deserialization paths; Add ZeroCopyByteArrayOutputStream * Fixes * Move GenericIndexedWriter.writeLongValueToOutputStream() and writeIntValueToOutputStream() to SerializerUtils * Move constructors * Add GenericIndexedBenchmark * Comments * Typo * Note in Javadoc that IntermediateLongSupplierSerializer, LongColumnSerializer and LongMetricColumnSerializer are thread-unsafe * Use primitive collections in IntermediateLongSupplierSerializer instead of BiMap * Optimize TableLongEncodingWriter * Add checks to SerializerUtils methods * Don't restrict byte order in SerializerUtils.writeLongToOutputStream() and writeIntToOutputStream() * Update GenericIndexedBenchmark * SerializerUtils.writeIntToOutputStream() and writeLongToOutputStream() separate for big-endian and native-endian * Add GenericIndexedBenchmark.indexOf() * More checks in methods in SerializerUtils * Use helperBuffer.arrayOffset() * Optimizations in SerializerUtils	2017-03-27 14:17:31 -05:00
Benedict Jin	23f77ebd20	Explain Avro's unnecessary EOFException (#4098 ) (#4100 ) * Explain Avro's unnecessary EOFException (#4098) * add jira link into log message	2017-03-24 10:45:45 -05:00
Gian Merlino	4b9f975f50	Rename SketchAggregationWithSimpleDataTest. (#4105 ) Tests that don't end in "Test" won't get run automatically by Maven.	2017-03-23 14:20:50 -07:00
Akash Dwivedi	ff7f90b02d	relocate method in BufferAggregator. (#4071 ) * relocate method in BufferAggregator. * Unused import. * Detailed javadoc. * using Int2ObjectMap. * batch relocate. * Revert batch relocate. * Unused import. * code comments. * code comment.	2017-03-23 13:07:59 -07:00
Roman Leventov	84fe91ba0b	Monomorphic processing of TopN queries with 1 and 2 aggregators (key part of #3798 ) (#3889 ) * Monomorphic processing: add HotLoopCallee, CalledFromHotLoop, RuntimeShapeInspector, SpecializationService. Specialize topN queries with 1 or 2 aggregators. Add Cursor.advanceUninterruptibly() and isDoneOrInterrupted() for exception-free query processing. * Use Execs.singleThreaded() * RuntimeShapeInspector to support nullable fields * Make CalledFromHotLoop annotation Inherited * Remove unnecessary conversion of array of ColumnSelectorPluses to list and back to array in CardinalityAggregatorFactory * Close InputStream in SpecializationService * Formatting * Test specialized PooledTopNScanners * Set flags in PooledTopNAlgorithm directly * Fix tests, dependent on CountAggragatorFactory toString() form * Fix * Revert CountAggregatorFactory changes * Implement inspectRuntimeShape() for LongWrappingDimensionSelector and FloatWrappingDimensionSelector * Remove duplicate RoaringBitmap dependency in the extendedset pom.xml * Fix * Treat ByteBuffers specially in StringRuntimeShape * Doc fix * Annotate BufferAggregator.init() with CalledFromHotLoop * Make triggerSpecializationIterationsThreshold an int * Remove SpecializationService.PerPrototypeClassState.of() * Add comments * Limit the amount of specializations that SpecializationService could make * Add default implementation for BufferAggregator.inspectRuntimeShape(), for compatibility with extensions * Use more efficient ConcurrentMap's idioms in SpecializationService	2017-03-17 14:44:36 -05:00
Charles Allen	805d85afda	Allow compilation as Java8 source and target (#3328 ) * Allow compilation as Java8 source and target for everything except API * Remove conditions in tests which assume that we may run with Java 7 * Update easymock to 3.4 * Make Animal Sniffer to check Java 1.8 usage; remove redundant druid-caffeine-cache configuration * Use try-with-resources in LargeColumnSupportedComplexColumnSerializerTest.testSanity() * Remove java7 special for druid-api	2017-03-14 22:23:47 -06:00
Gian Merlino	3216134f8c	SQL: Make row extractions extensible and add one for lookups. (#3991 ) This is a reopening of #3989, since that PR was merged to master prematurely and accidentally.	2017-03-13 21:56:16 -07:00
Nishant Bangarwa	adbe89e7d6	Fix race in KafkaIndexTaskTest (#4031 ) task.pause(0) can return early before the task is actually paused. Exception for failure - java.lang.AssertionError: expected:<PAUSED> but was:<READING> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at io.druid.indexing.kafka.KafkaIndexTaskTest.testRunWithOffsetOutOfRangeEx ceptionAndPause(KafkaIndexTaskTest.java:1229) To reproduce add Thread.sleep(10000) in beginning of KafkaIndexTask.possiblypause method.	2017-03-09 07:34:46 -08:00
Gian Merlino	4ca5270e88	Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods. (#4004 ) * Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods. Includes two fixes: - groupBy v2 now ignores chunkPeriod, since it wouldn't have helped anyway (its mergeResults returns a lazy sequence) and it generates incorrect results. - Fix chunkPeriod handling for periods of irregular length, like "P1M" or "P1Y". Also includes doc and test fixes: - groupBy v1 was no longer being tested by GroupByQueryRunnerTest since #3953, now it is once again. - chunkPeriod documentation was misleading due to its checkered past. Updated it to be more accurate. * Remove unused import. * Restore buffer size.	2017-03-06 12:27:02 -06:00
Akash Dwivedi	bebf9f34c7	HdfsDataSegmentPusher bug fix (#4003 ) * Fix for HdfsDataSegmentPusher. * Add missing loadspec in actual descriptor file. Tests to check actual content of descriptor file.	2017-03-06 00:53:44 -08:00
Gian Merlino	df623ebfe3	Fix a couple bugs due to calling Period.getMillis(). (#4006 )	2017-03-05 18:44:20 +05:30
Roman Leventov	81a5f9851f	TmpFileIOPeons to create files under the merging output directory, instead of java.io.tmpdir (#3990 ) * In IndexMerger and IndexMergerV9, create temporary files under the output directory/tmpPeonFiles, instead of java.io.tmpdir * Use FileUtils.forceMkdir() across the codebase and remove some unused code * Fix test * Fix PullDependencies.run() * Unused import	2017-03-02 14:05:12 -08:00
Gian Merlino	e63eefd7ff	Revert "SQL: Make row extractions extensible and add one for lookups. (#3989 )" The PR was merged to master accidentally. This reverts commit `23927a3c96`.	2017-03-01 17:06:12 -08:00
Gian Merlino	23927a3c96	SQL: Make row extractions extensible and add one for lookups. (#3989 ) * SQL: Make row extractions extensible and add one for lookups. * Fix QuantileSqlAggregatorTest.	2017-03-01 17:03:43 -08:00
Akash Dwivedi	94da5e80f9	Namespace optimization for hdfs data segments. (#3877 ) * NN optimization for hdfs data segments. * HdfsDataSegmentKiller, HdfsDataSegment finder changes to use new storage format.Docs update. * Common utility function in DataSegmentPusherUtil. * new static method `makeSegmentOutputPathUptoVersionForHdfs` in JobHelper * reuse getHdfsStorageDirUptoVersion in DataSegmentPusherUtil.getHdfsStorageDir() * Addressed comments. * Review comments. * HdfsDataSegmentKiller requested changes. * extra newline * Add maprfs.	2017-03-01 09:51:20 -08:00
Akash Dwivedi	91344cbe57	Enable GenericIndexed V2 for built-in(druid-io managed) complex columns. (#3987 ) * Enable GenericIndexed V2 for complex columns. * SerializerBuilder to use GenericColumnSerializer.	2017-02-28 22:06:54 -08:00
praveev	5ccfdcc48b	Fix testDeadlock timeout delay (#3979 ) * No more singleton. Reduce iterations * Granularities * Fix the delay in the test * Add license header * Remove unused imports * Lot more unused imports from all the rearranging * CR feedback * Move javadoc to constructor	2017-02-28 12:51:41 -06:00
praveev	c3bf40108d	One granularity (#3850 ) * Refactor Segment Granularity * Beginning of one granularity * Copy the fix for custom periods in segment-grunalrity over here. * Remove the custom serialization for now. * Compilation cleanup * Reformat code * Fixing unit tests * Unify to use a single iterable * Backward compatibility for rolling upgrade * Minor check style. Cosmetic changes. * Rename length and millis to duration * CR feedback * Minor changes.	2017-02-25 01:02:29 -06:00
Gian Merlino	f21641f0dc	Fix over-optimistic log message. (#3963 ) "Wrote task log" could be logged before the output stream is flushed and closed, which could generate an error and not actually write the log.	2017-02-22 15:02:53 -08:00
Parag Jain	edb032b96d	add datasource in intermediate segment path (#3961 )	2017-02-22 16:31:00 -06:00
Gian Merlino	985203b634	Finalize fields in postaggs (#3957 ) * initial commits for finalizeFieldAccess #2433 * fix some bugs to run a query * change name of method Queries.verifyAggregations to Queries.prepareAggregations * add Uts * fix Ut failures * rebased to master * address comments and add a Ut for arithmetic post aggregators * rebased to the master * address the comment of injection within arithmetic post aggregator * address comments and introduce decorate() in the PostAggregator interface. * Address comments. 1. Implements getComparator in FinalizingFieldAccessPostAggregator and add Uts for it 2. Some minor changes like renaming a method name. * Fix a code style mismatch. * Rebased to the master	2017-02-21 16:32:14 -08:00
Gian Merlino	16ef513c7d	SQL: Add context and contextual functions to planner. (#3919 ) * SQL: Add context and contextual functions to planner. Added support for context parameters specified as JDBC connection properties or a JSON object for SQL-over-JSON-over-HTTP. Also added features that depend on context functionality: - Added CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP functions. - Added support for time zones other than UTC via a "timeZone" context. - Pass down query context to Druid queries too. Also some bug fixes: - Fix DATE handling, it was largely done incorrectly before. - Fix CAST(__time TO DATE) which should do a floor-to-day. - Fix non-equality comparisons to FLOOR(__time TO X). - Fix maxQueryCount property. * Pass down context to nested queries too.	2017-02-15 14:09:14 -08:00
Gian Merlino	78b0d134ae	Require Java 8 and include some Java 8 dependencies. (#3914 ) * Require Java 8 and include some Java 8 dependencies. - Upgrade Jetty to 9.3.16.v20170120. - Upgrade DataSketches to 0.8.4. - Bundle caffeine-cache by default. - Still target Java 7 when compiling base Druid classes. * Update cluster, quickstart docs. * Remove oraclejdk7 from travis.yml.	2017-02-14 12:51:51 -08:00
Akash Dwivedi	8854ce018e	File.deleteOnExit() (#3923 ) * Less use of File.deleteOnExit() * removed deleteOnExit from most of the tests/benchmarks/iopeon * Made IOpeon closable * Formatting. * Revert DeterminePartitionsJobTest, remove cleanup method from IOPeon	2017-02-13 15:12:14 -08:00
Parag Jain	1f263fe50b	alert when resetting offsets (#3931 ) * alert when resetting offsets * add more data to alerts	2017-02-13 13:49:24 -08:00
michaelschiff	c1eee9bbf3	modified "end" column to `end` (#3903 ) * modified "end" column to `end`. "end" is interpretted as a string rather than dereferencing the column value * SQLMetadataConnector.getQuoteString defines the string that should be used to quote string fields * positional arguments for String.format * for Connectors that use " need to include the \ escape as well	2017-02-13 12:36:27 -08:00
Jihoon Son	991e2852da	Add PostAggregators to generator cache keys for top-n queries (#3899 ) * Add PostAggregators to generator cache keys for top-n queries * Add tests for strings * Remove debug comments * Add type keys and list sizes to cache key * Make post aggregators used for sort are considered for cache key generation * Use assertArrayEquals() * Improve findPostAggregatorsForSort() * Address comments * fix test failure * address comments	2017-02-13 12:23:44 -08:00
Parag Jain	8e31a465ad	report hand off count finite appenderator driver (#3925 )	2017-02-13 10:41:24 -08:00
Gian Merlino	12317fd001	Bump version to 0.10.0-SNAPSHOT. (#3913 )	2017-02-06 17:54:35 -08:00
Parag Jain	1aabb45a09	auto reset option for Kafka Indexing service (#3842 ) * auto reset option for Kafka Indexing service in case message at the offset being fetched is not present anymore at kafka brokers * review comments * review comments * reverted last change * review comments * review comments * fix typo	2017-02-02 14:57:45 -06:00
Nishant Bangarwa	a457cded28	Druid Extension to enable Authentication using Kerberos. (#3853 ) * Add extension for supporting kerberos security - This PR adds an extension for supporting druid authentication via Kerberos. - Working on the docs. * Add docs * review comments * more review comments * Block all paths by default * more review comments - use proper Oid * Allow extensions to override httpclient for integration tests * Add kerberos lock to prevent multithreaded issues. * review comment - remove enabled flag and fix router injection * Add Cookie Handling and more detailed docs * review comment - rename DruidKerberosConfig -> AuthKerberosConfig * review comments * fix travis failure on jdk7	2017-02-02 14:55:21 -06:00
Charles Allen	a73f1c9c70	Make s3 work better (#3898 )	2017-02-02 10:04:30 -08:00
Jonathan Wei	e6b95e80aa	Remove deprecated Aggregator/AggregatorFactory methods (#3894 )	2017-02-01 14:43:18 -08:00
Gian Merlino	ac84a3e011	SQL: Add resolution parameter, fix filtering bug with APPROX_QUANTILE (#3868 ) * SQL: Add resolution parameter to quantile agg, rename to APPROX_QUANTILE. * Fix bug with re-use of filtered approximate histogram aggregators. Also add APPROX_QUANTILE tests for filtering and running on complex columns. Includes some slight refactoring to allow tests to make DruidTables that include complex columns. * Remove unused import	2017-01-25 18:39:26 -08:00
Parag Jain	b3dae0efc3	catch all errors (#3844 )	2017-01-24 18:01:30 -07:00
Gian Merlino	d51f5e058d	SQL: Ditch CalciteConnection layer and add DruidMeta, extension aggregators. (#3852 ) * SQL: Ditch CalciteConnection layer and add DruidMeta, extension aggregators. Switched from CalciteConnection to Planner, bringing benefits: - CalciteConnection's JDBC interface no longer sits between the SQL server (HTTP/Avatica) and Druid's query layer. Instead, the SQL servers can use Druid Sequence objects directly, reducing overhead in the query return path. - Implemented our own Planner-based Avatica Meta, letting us control connection timeouts and connection / statement limits. The previous CalciteConnection-based implementation didn't have any limits or timeouts. - The Planner interface lets us override the operator table, opening up SQL language extensions. This patch includes two: APPROX_COUNT_DISTINCT in core, and a QUANTILE aggregator in the druid-histogram extension. Also: - Added INFORMATION_SCHEMA metadata schema. - Added tests for Unicode literals and escapes. * Verify statement is actually open before closing it. * More detailed INFORMATION_SCHEMA docs.	2017-01-19 16:32:20 -08:00
Akash Dwivedi	e550d48772	Using fully qualified hdfs path. (#3705 ) * Using fully qualified hdfs path. * Review changes. * Remove unused imports. * Variable name change.	2017-01-17 14:40:22 -06:00
Jihoon Son	d80bec83cc	Enable auto license checking (#3836 ) * Enable license checking * Clean duplicated license headers	2017-01-10 18:13:47 -08:00
Roman Leventov	49d71e9b38	Fix the build after #3697 (#3807 )	2016-12-26 17:06:48 -06:00
Roman Leventov	33800122ad	Don't return leaked Objects back to StupidPool, because this is dangerous. Reuse Cleaners in StupidPool. Make StupidPools named. Add StupidPool.leakedObjectCount(). Minor fixes (#3631 )	2016-12-26 00:35:35 -06:00
Roman Leventov	76cb06a8d8	Lookup cache refactoring (the main part of #3667 ) (#3697 ) * Lookup cache refactoring (the main part of druid-io/druid#3667) * Use PowerMock's static methods in NamespaceLookupExtractorFactoryTest * Fix KafkaLookupExtractorFactoryTest * Use VisibleForTesting annotation instead of Javadoc comment * Create a NamespaceExtractionCacheManager separately for each test in NamespaceExtractionCacheManagersTest * Rename CacheScheduler.NoCache.ENTRY_DISPOSED to ENTRY_CLOSED * Reduce visibility of NamespaceExtractionCacheManager.cacheCount() and monitor() implementations, and don't run NamespaceExtractionCacheManagerExecutorsTest with off-heap cache (it didn't before) * In NamespaceLookupExtractorFactory, use safer idiom to check if CacheState is NoCache or VersionedCache * More logging in CacheHandler constructor and close(), VersionedCache.close() * PR comments addressed * Make CacheScheduler.EntryImpl AutoCloseable, avoid 'dispose' verb in comments, logging and naming in CacheScheduler in favor of 'close' * More Javadoc comments to CacheScheduler * Fix NPE * Remove logging in OnHeapNamespaceExtractionCacheManager.expungeCollectedCaches() * Make NamespaceExtractionCacheManagersTest.testRacyCreation() to have similar load to what it be before the refactoring * Unwrap NamespaceExtractionCacheManager.scheduledExecutorService from unneeded MoreExecutors.listeningDecorator() and specify that this is ScheduledThreadPoolExecutor, which ensures happens-before between periodic runs of the tasks * More comments on MapDbCacheDisposer.disposed * Replace concat with Long.toString() * Comment on why NamespaceExtractionCacheManager.scheduledExecutorService() returns ScheduledThreadPoolExecutor * Place logging statements in VersionedCache.close() and CacheHandler.close() after actual closing logic, because logging may fail * Make JDBCExtractionNamespaceCacheFactory and StaticMapExtractionNamespaceCacheFactory to try to close newly created VersionedCache if population has failed, as it is done already in URIExtractionNamespaceCacheFactory * Don't close the whole CacheScheduler.Entry, if the cache update task failed * Replace AtomicLong updateCounter and firstRunLatch with Phaser-based UpdateCounter in CacheScheduler.EntryImpl	2016-12-23 18:04:27 -08:00
Himanshu	4ca3b7f1e4	overlord helpers framework and tasklog auto cleanup (#3677 ) * overlord helpers framework and tasklog auto cleanup * review comment changes * further review comments addressed	2016-12-21 15:18:55 -08:00
Gian Merlino	6440ddcbca	Fix #3795 (Java 7 compatibility). (#3796 ) * Fix #3795 (Java 7 compatibility). Also introduce Animal Sniffer checks during build, which would have caught the original problems. * Add Animal Sniffer on caffeine-cache for JDK8.	2016-12-21 10:19:13 -08:00
David Lim	0b9dff0bc1	fix worker thread pool exhaustion bug (#3760 ) * fix worker thread pool exhaustion bug * code review changes * code review changes	2016-12-09 15:23:11 -08:00
David Lim	7f087cdd3b	allow Kafka consumer group.id to be overriden by config (#3765 )	2016-12-08 15:53:13 -08:00
Charles Allen	27ab23ef44	Don't update segment metadata if archive doesn't move anything (#3476 ) * Don't update segment metadata if archive doesn't move anything * Fix restore task to handle potential null values * Don't try to update empty metadata * Address review comments * Move to druid-io java-util	2016-12-01 07:49:28 -08:00
Parag Jain	7ee6bb7410	option to reset offest automatically in case of OffsetOutOfRangeException (#3678 ) * option to reset offset automatically in case of OffsetOutOfRangeException if the next offset is less than the earliest available offset for that partition * review comments * refactoring * refactor * review comments	2016-11-21 16:29:46 -06:00
Roman Leventov	7b56cec3b9	Fix resource leaks (#3702 )	2016-11-18 21:21:36 +05:30
Gian Merlino	7e80d1045a	Exercise v2 engine in the groupBy aggregator and multi-value dimension tests. (#3698 ) This also involved some other test changes: - Added a factory.mergeRunners step to AggregationTestHelper's groupBy chain, since the v2 engine does merging there. - Changed test byteBuffer pools from on-heap to off-heap to work around https://github.com/DataSketches/sketches-core/pull/116 for datasketches tests.	2016-11-16 20:02:25 -08:00
Gian Merlino	bcd20441be	Make buildV9Directly the default. (#3688 )	2016-11-14 09:29:32 -08:00
Roman Leventov	988d97b09c	Unwrap exceptions from RuntimeException in URIExtractionNamespaceCacheFactory.populateCache() (part of #3667 ) (#3668 ) * Unwrap exceptions from RuntimeException in URIExtractionNamespaceCacheFactory.populateCache() * Fix tests	2016-11-11 17:25:41 -08:00
Himanshu	ddc078926b	consolidate different theta sketch representations into SketchHolder (#3671 )	2016-11-11 10:20:41 -08:00
Himanshu	b76b3f8d85	reset-cluster command to clean up druid state stored on metadata and deep storage (#3670 )	2016-11-09 11:07:01 -06:00
Nicolas Colomer	37ecffb648	Add support for Confluent Schema Registry in the avro extension (#3529 )	2016-11-08 16:10:45 -06:00
Gian Merlino	657e4512d2	Checkstyle checks for AvoidStaticImport, UnusedImports. (#3660 ) Excludes tests from AvoidStaticImport, since those are used often there and I didn't want to make this changeset too large. Production code use was minimal and I switched those to non-static imports.	2016-11-05 11:34:36 -07:00
Roman Leventov	22b57ddd60	Make ExtractionNamespaceCacheFactory to populate cache directly instead of returning callable (#3651 ) * Rename ExtractionNamespaceCacheFactory.getCachePopulator() to populateCache() and make it to populate cache itself instead of returning a Callable which populates cache, because this "callback style" is not actually needed. ExtractionNamespaceCacheFactory isn't a "factory" so it should be renamed, but renaming right in this commit would tear the git history for files, because ExtractionNamespaceCacheFactory implementations have too many changed lines. Going to rename ExtractionNamespaceCacheFactory to something like "CachePopulator" in one of subsequent PRs. This commit is a part of a bigger refactoring of the lookup cache subsystem. * Remove unused line and imports	2016-11-04 13:33:16 -07:00
Gian Merlino	4203580290	URIExtractionNamespace: Treat null values in lookup maps as missing entries. (#3512 ) * URIExtractionNamespace: Treat null values in lookup maps as missing entries. This is useful when many logical lookups are derived from the same base JSON file, and some lookups' values may be unknown sometimes. * Add test, logging message, and address other comments. * Update docs.	2016-11-03 13:53:04 -07:00
Himanshu	2362effd8c	use FileSystem.rename(from,to,Rename.NONE) so that tmp dirs from replicating tasks are not moved to the segment directory created by first task (#3650 )	2016-11-02 15:58:55 -07:00
Roman Leventov	36a1543222	Lookup cache bug fixes (#3609 ) * Return better lastVersion from JDBCExtractionNamespaceCacheFactory's cache populator callable * Return the lastVersion if URI lookup last modified date is not later than the last cached, from URIExtractionNamespaceCacheFactory's cache populator callable * Fix a race condition in NamespaceExtractionCacheManager.cancelFuture() * Don't delete cache from NamespaceExtractionCacheManager if the ExtractionNamespaceCacheFactory returned the same version as the last; Better exception treatment in the scheduled cache updater runnable in NamespaceExtractionCacheManager (in particular, don't consume Errors); throw AssertionError in StaticMapExtractionNamespaceCacheFactory if the lastVersion != null) * In NamespaceExtractionCacheManager, put NamespaceImplData.latestVersion update in the same synchronized() block with swapAndClearCache(id, cacheId); Turn getPostRunnable which returns a callback into a simple updateNamespace() method * In StaticMapExtractionNamespaceCacheFactory.getCachePopulator(), check the input directly, not inside a callback * In URIExtractionNamespaceCacheFactory, allow URI last modified time to go backwards * Better logging in NamespaceExtractionCacheManager * Add comment on lastVersion nullability in URIExtractionNamespaceCacheFactory	2016-11-02 09:40:19 -07:00
Himanshu	eb70a12e43	fix cleanup of tmp dir in HdfsDataSegmentPusher (#3636 )	2016-11-01 12:45:38 -05:00
Gian Merlino	89d9c61894	Deprecate Aggregator.getName and AggregatorFactory.getAggregatorStartValue. (#3572 )	2016-10-31 15:24:30 -07:00
Himanshu	23a8e22836	fix SketchMergeAggregatorFactory.finalizeResults, comparator and more UTs for timeseries, topN (#3613 )	2016-10-28 15:48:33 -07:00
Charles Allen	78159d7ca4	Move off-heap QTL global cache delete lock outside of subclass lock (#3597 ) * Move off-heap QTL global cache delete lock outside of subclass lock * Make `delete` thread safe	2016-10-27 22:23:53 +05:30
David Lim	3c56cbdf82	fix timing issue with KafkaLookupExtractorFactoryTest (#3604 )	2016-10-25 07:04:51 -07:00
Akash Dwivedi	4b3bd8bd63	Migrating java-util from Metamarkets. (#3585 ) * Migrating java-util from Metamarkets. * checkstyle and updated license on java-util files. * Removed unused imports from whole project. * cherry pick metamx/java-util@826021f. * Copyright changes on java-util pom, address review comments.	2016-10-21 14:57:07 -07:00
David Lim	c2ae734848	KafkaIndexTask: Allow run thread to stop gracefully instead of interrupting (#3534 ) * allow run thread to gracefully complete instead of interrupting when stopGracefully() is called * add comments	2016-10-17 10:52:19 -04:00
Gian Merlino	c1d3b8a30c	Remove dropwizard-jdbc dependency from lookups-cached-single. (#3573 ) Fixes #3548.	2016-10-17 10:37:47 -04:00
Gian Merlino	0ce33bc95f	HdfsDataSegmentPusher: Properly include scheme, host in output path if necessary. (#3577 ) Fixes #3576.	2016-10-17 10:37:18 -04:00
David Lim	472c409b99	KafkaLookupExtractorFactory: shutdown kafka consumer on close() (#3539 ) * shutdown kafka consumer on close * handle close() race condition	2016-10-15 09:55:51 -07:00
Roman Leventov	5dc95389f7	Add Checkstyle framework (#3551 ) * Add Checkstyle framework * Avoid star import * Need braces for control flow statements * Redundant imports * Add NewLineAtEndOfFile check	2016-10-13 13:37:47 -07:00
jaehong choi	6f21778364	Support finding segments in AWS S3. (#3399 ) * support finding segments from a AWS S3 storage. * add more Uts * address comments and add a document for the feature. * update docs indentation * update docs indentation * address comments. 1. add a Ut for json ser/deser for the config object. 2. more informant error message in a Ut. * address comments. 1. use @Min to validate the configuration object 2. change updateDescriptor to a string as it does not take an argument otherwise * fix a Ut failure - delete a Ut for testing default max length.	2016-10-10 17:27:09 -07:00
Parag Jain	c255dd8b19	fix datasegment metadata (#3555 )	2016-10-07 16:30:33 -05:00
Parag Jain	76a60a007e	create parent dir on HDFS if it does not exist (#3547 )	2016-10-06 16:14:00 -07:00
Himanshu	1523de08fb	SketchAggregatorFactory.combine(..) returns Union object now so that it can be reused across multiple combine(..) calls (#3471 )	2016-10-05 08:40:14 -07:00
Parag Jain	592903571a	add context to kafka supervisor for the kafka indexing task (#3464 )	2016-10-04 20:08:43 -05:00
Parag Jain	e419407eba	handle supervisor spec metadata failures (#3456 ) close kafka consumer in case supervisor start fails	2016-10-04 10:15:28 -07:00
Gian Merlino	40f2fe7893	Bump versions to 0.9.3-SNAPSHOT (#3524 )	2016-09-29 13:53:32 -07:00
Parag Jain	15c9918c65	log exceptions while trying to pause task (#3504 )	2016-09-23 16:53:23 -07:00
David Lim	9226d4af3c	configurable shutdownTimeout for Kakfa supervisor (#3497 ) * configurable shutdownTimeout * cr change	2016-09-23 13:26:45 -06:00
David Lim	ca9114b41b	add supervisor reset API (#3484 ) * add supervisor reset API * CR doc changes and kill running tasks / clear offsets from supervisor	2016-09-22 17:51:06 -07:00
Nishant	6099d20303	[FIX] ReleaseException when the path is being written by multiple tasks (#3494 ) * fix ReleaseException when the path is being written by multiple task * Do not throw IOException if another replica wins the race for segment creation fix if check * handle logging comments * fix test	2016-09-22 14:25:41 -05:00
Navis Ryu	74e1243c7e	Fix test fail of PollingLookupTest.testApplyAfterDataChange (#3489 )	2016-09-22 08:33:59 -07:00
Himanshu	05ea88df5c	fix kafka-indexing-service pom to not reference specific version but parent version for druid core dependencies (#3472 )	2016-09-20 15:18:21 -07:00
David Lim	96fcca18ea	update KafkaSupervisor to make HTTP requests to tasks in parallel where possible (#3452 )	2016-09-20 22:51:15 +05:30
Slim	3175e17a3b	Cached lookup module. first cut implementing JDBC cache (#2819 )	2016-09-16 13:45:54 -07:00
Charles Allen	95e08b38ea	[QTL] Reduced Locking Lookups (#3071 ) * Lockless lookups * Fix compile problem * Make stack trace throw instead * Remove non-germane change * * Add better naming to cache keys. Makes logging nicer * Fix #3459 * Move start/stop lock to non-interruptable for readability purposes	2016-09-16 11:54:23 -07:00
Gleb Smirnov	d981a2aa02	Avoid interrupting ZookeeperConsumerConnector.shutdown() #3346 (#3403 )	2016-09-14 17:44:27 -07:00
Himanshu	a069257d37	avro-extension -- feature to specify multiple avro reader schemas inline (#3368 ) * rename SimpleAvroBytesDecoder to InlineSchemaAvroBytesDecoder * feature to specify multiple schemas inline in avro module	2016-09-13 14:54:31 -07:00
Gian Merlino	bcff08826b	KafkaIndexTask: Treat null values as unparseable. (#3453 )	2016-09-13 10:56:38 -07:00
Slim	ba6ddf307e	Adding hadoop kerberos authentification. (#3419 ) * adding kerberos authentication * make the 2 functions identical	2016-09-13 10:42:50 -07:00
Jonathan Wei	df766b2bbd	Add dimension handling interface for ingestion and segment creation (#3217 ) * Add dimension handling interface for ingestion and segment creation * update javadocs for DimensionHandler/DimensionIndexer * Move IndexIO row validation into DimensionHandler * Fix null column skipping in mergerV9 * Add deprecation note for 'numeric_dims' filename pattern in IndexIO v8->v9 conversion * Fix java7 test failure	2016-09-12 12:54:02 -07:00
Alexander Saydakov	1a5042ca26	updated dependency on sketches-core (#3443 ) * updated dependency on sketches-core to 0.7.0 * Use sketches-core-0.4.1, which is the latest version still compatible with JDK7	2016-09-09 16:21:32 -07:00
David Lim	146a17de48	KafkaIndexTask: allow pause to break out of retry loop (#3401 )	2016-09-06 22:29:37 -06:00
David Lim	5b1ae21bd1	retry calls to getStartTime (#3429 )	2016-09-06 14:02:22 -07:00
Stéphane Derosiaux	48dce88aab	Add flag binaryAsString for parquet ingestion (#3381 )	2016-08-30 17:30:50 -07:00
David Lim	ed924bf214	allow registrants to opt out of announcing themselves when registering as a chat handler (#3360 )	2016-08-16 10:51:28 +05:30
Himanshu	70d99fe3c6	Initialize ApproximateHistogram Module in ApproximateHistogramGroupByQueryTest (#3363 ) or else the test fails if ran independently.	2016-08-15 10:19:33 -07:00
Himanshu	46da682231	avro-extensions -- feature to specify avro reader schema inline in the task json for all events (#3249 )	2016-08-10 10:49:26 -07:00
Jonathan Wei	890e3bdd3f	More informative query unit test names (#3342 )	2016-08-09 22:24:48 -07:00
Jonathan Wei	decefb7477	Add time interval dim filter and retention analysis example (#3315 ) * Add time interval dim filter and retention analysis example * Use closed-open matching for intervals, update cache key generation * Fix time filtering tests for interval boundary change	2016-08-05 07:25:04 -07:00
Navis Ryu	5b3f0ccb1f	Support variance and standard deviation (#2525 ) * Support variance and standard deviation * addressed comments	2016-08-04 17:32:58 -07:00
Gleb Smirnov	33dbe0800c	Makes kafka lookup extraction factory's replace() behavior consistent with other lookup extraction factories (#3326 )	2016-08-04 10:24:19 -07:00
Gian Merlino	8030f1cb67	Be more respectful of maxRowsInMemory. (#3284 ) - Appenderator: Respect maxRowsInMemory across all sinks. - KafkaIndexTask: Respect maxRowsInMemory across all partitions.	2016-07-26 15:02:35 -06:00
Charles Allen	3f1681c16c	Caffeine cache extension (#3028 ) * Initial commit of caffeine cache * Address code comments * Move and fixup README.md a bit * Improve caffeine readme information * Cleanup caffeine pom * Address review comments * Bump caffeine to 2.3.1 * Bump druid version to 0.9.2-SNAPSHOT * Make test not fail randomly. See https://github.com/ben-manes/caffeine/pull/93#issuecomment-227617998 for an explanation * Fix distribution and documentation * Add caffeine to extensions.md * Fix links in extensions.md * Lexicographic	2016-07-06 15:42:54 -07:00
Charles Allen	bfa5c05aaa	Make global lookup cache introspector class public (#3199 ) * Make global lookup cache introspector class public * Fixes #3187 * Make KafkaLookupExtractorIntrospectionHandler a public static class	2016-07-01 15:50:57 -07:00
Xavier Léauté	485e381387	remove datasource from hadoop output path (#3196 ) fixes #2083, follow-up to #1702	2016-06-29 08:53:45 -07:00
David Lim	1d40df4bb7	fix kafka consumer concurrent access during shutdown (#3193 )	2016-06-28 13:23:17 -07:00
Hyukjin Kwon	45f553fc28	Replace the deprecated usage of NoneShardSpec (#3166 )	2016-06-25 10:27:25 -07:00
Gian Merlino	4cc39b2ee7	Alternative groupBy strategy. (#2998 ) This patch introduces a GroupByStrategy concept and two strategies: "v1" is the current groupBy strategy and "v2" is a new one. It also introduces a merge buffers concept in DruidProcessingModule, to try to better manage memory used for merging. Both of these are described in more detail in #2987. There are two goals of this patch: 1. Make it possible for historical/realtime nodes to return larger groupBy result sets, faster, with better memory management. 2. Make it possible for brokers to merge streams when there are no order-by columns, avoiding materialization. This patch does not do anything to help with memory management on the broker when there are order-by columns or when there are nested queries. That could potentially be done in a future patch.	2016-06-24 18:06:09 -07:00
du00cs	ebd654228b	fix: avro types exception in sketch (#3167 )	2016-06-22 15:54:20 -05:00
Charles Allen	674f94083e	Add more logging around failed S3DataSegmentMover DeleteExceptions (#3104 ) * Add more logging around failed S3DataSegmentMover DeleteExceptions * Fix test NPE	2016-06-16 14:58:33 -07:00
Charles Allen	f7fa1d8c62	[QTL] Allow S3 version finder to search entire s3 object key (#3139 ) * Allow S3 version finder to search entire s3 object key * Previously only was able to search immediate "directory" * Update method javadoc * Expand docs a bit better	2016-06-13 21:02:28 -07:00
Gian Merlino	ebf890fe79	Update master version to 0.9.2-SNAPSHOT. (#3133 )	2016-06-13 13:10:38 -07:00
David Lim	4faa298977	update kafka client for kafka indexing service to 0.9.0.1 (#3109 )	2016-06-08 06:51:03 -07:00
Charles Allen	8cac710546	Async lookups-cached-global by default (#3074 ) * Async lookups-cached-global by default * Also better lookup docs * Fix test timeouts * Fix timing of deserialized test * Fix problem with 0 wait failing immediately	2016-06-03 15:58:10 -05:00
David Lim	a2290a8f05	support seamless config changes (#3051 )	2016-06-03 13:50:19 -07:00
Charles Allen	447033985e	Make S3DataSegmentMover not bother checking for items if they are the same (#3032 ) * Make S3DataSegmentMover not bother checking for items if they are the same	2016-06-02 17:27:21 +01:00
David Lim	f6c39cc844	Kafka task minimum message time (#3035 ) * add KafkaIndexTask support for minimumMessageTime * add Kafka supervisor support for lateMessageRejectionPeriod	2016-05-31 11:37:00 -07:00
David Lim	3ef24c03b3	Validate X-Druid-Task-Id header in request/response and support retrying on outdated TaskLocation information, add KafkaIndexTaskClient unit tests (#3006 ) * validate X-Druid-Task-Id header in request and add header to response * modify KafkaIndexTaskClient to take a TaskLocationProvider as the TaskLocation may not remain constant	2016-05-25 22:05:18 -07:00
Charles Allen	8024b915e2	[QTL] Implement LookupExtractorFactory of namespaced lookup (#2926 ) * support LookupReferencesManager registration of namespaced lookup and eliminate static configurations for lookup from namespecd lookup extensions - druid-namespace-lookup and druid-kafka-extraction-namespace are modified - However, druid-namespace-lookup still has configuration about ON/OFF HEAP cache manager selection, which is not namespace wide configuration but node wide configuration as multiple namespace shares the same cache manager * update KafkaExtractionNamespaceTest to reflect argument signature changes * Add more synchronization functionality to NamespaceLookupExtractorFactory * Remove old way of using extraction namespaces * resolve compile error by supporting LookupIntrospectHandler * Remove kafka lookups * Remove unused stuff * Fix start and stop behavior to be consistent with new javadocs * Remove unused strings * Add timeout option * Address comments on configurations and improve docs * Add more options and update hash key and replaces * Move monitoring to the overriding classes * Add better start/stop logging * Remove old docs about namespace names * Fix bad comma * Add `@JsonIgnore` to lookup factory * Address code review comments * Remove ExtractionNamespace from module json registration * Fix problems with naming and initialization. Add tests * Optimize imports / reformat * Fix future not being properly cancelled on failed initial scheduling * Fix delete returns * Add more docs about whole introspection * Add `/version` introspection point for lookups * Add more tests and address comments * Add StaticMap extraction namespace for testing. Also add a bunch of tests * Move cache system property to `druid.lookup.namespace.cache.type` * Make VERSION lower case * Change poll period to 0ms for StaticMap * Move cache key to bytebuffer * Change hashCode and equals on static map extraction fn * Add more comments on StaticMap * Address comments * Make scheduleAndWait use a latch * Sanity renames and fix imports * Remove extra info in docs * Fix review comments * Strengthen failure on start from warn to error * Address comments * Rename namespace-lookup to lookups-cached-global * Fix injective mis-naming * Also add serde test	2016-05-24 10:56:40 -07:00
Charles Allen	15ccf451f9	Move QueryGranularity static fields to QueryGranularities (#2980 ) * Move QueryGranularity static fields to QueryGranularityUtil * Fixes #2979 * Add test showing #2979 * change name to QueryGranularities	2016-05-17 16:23:48 -07:00
Himanshu	d3e9c47a5f	use correct ObjectMapper in Index[IO/Merger] in AggregationTestHelper and minor fix in theta sketch SketchMergeAggregatorFactory.getMergingFactory(..) (#2943 )	2016-05-13 10:06:31 +05:30
Slim	45b2e65d75	[QTL] adding listDelimiter to lookup parser spec (#2941 ) * adding listDelimiter to lookup parser spec * cleaning code	2016-05-10 15:41:16 +05:30
Charles Allen	90b0b0a4ad	Make URIExtraction not require FileSystem impls for URIs it understands (#2929 ) * Make URIExtraction not require FileSystem impls for URIs it understands * Fixes #2928 * Preserve URI information * Simply case for exact matching * Move unused variable	2016-05-08 23:23:53 +05:30
David Lim	b489f63698	Supervisor for KafkaIndexTask (#2656 ) * supervisor for kafka indexing tasks * cr changes	2016-05-04 23:13:13 -07:00
Charles Allen	2a769a9fb7	Make S3DataSegmentPuller do GET requests less often (#2900 ) * Make S3DataSegmentPuller do GET requests less often * Fixes #2894 * Run intellij formatting on S3Utils * Remove forced stream fetching on getVersion * Remove unneeded finalize * Allow initial object fetching to fail and be retried	2016-05-04 16:21:35 -07:00
Gian Merlino	f8ddfb9a4b	Split SegmentInsertAction and SegmentTransactionalInsertAction for backwards compat. (#2922 ) Fixes #2912.	2016-05-04 13:54:34 -07:00
Charles Allen	6b957aa072	[QTL] Make URI Exctraction Namespace take more sane arguments (#2738 ) * Make URI Exctraction Namespace take more sane arguments * Fixes https://github.com/druid-io/druid/issues/2669 * Update docs * Rename error message * Undo overzealous deletion of docs * Explain caching mechanism a bit more in docs	2016-05-02 12:54:34 -07:00
Charles Allen	54b717bdc3	[QTL] Move kafka-extraction-namespace to the Lookup framework. (#2800 ) * Move kafka-extraction-namespace to the Lookup framework. * Address comments * Fix missing kafka introspection * Fix tests to be less racy * Make testing a bit more leniant * Make tests even more forgiving * Add comments to kafka lookup cache method * Move startStopLock to just use started * Make start() and stop() idempotent * Forgot to update test after last change, test now accounts for idempotency * Add extra idempotency on stop check * Add more descriptive docs of behavior	2016-05-02 09:45:13 -07:00
Gian Merlino	67b47c982f	Datasketches: Remove isInputThetaSketch from cache key. (#2899 )	2016-04-28 18:14:52 -07:00
Gian Merlino	16080dc54f	Adjust colliding aggregator cache IDs. (#2891 ) - Renumbered ApproximateHistogramAggregatorFactory from 8 to 12, 8 was taken by CardinalityAggregatorFactory - Renumbered ApproximateHistogramFoldingAggregatorFactory from 9 to 13, 9 was taken by FilteredAggregatorFactory	2016-04-28 10:11:33 -07:00
Gian Merlino	909abd17f3	Sketch cache key should include size, isInputThetaSketch. (#2893 )	2016-04-28 10:10:46 -07:00
David Lim	7641f2628f	add control and status endpoints to KafkaIndexTask (#2730 )	2016-04-21 15:34:59 -07:00
Xavier Léauté	5938d9085b	Stream segments from database (#2859 ) * Avoids fetching all segment records into heap by JDBC driver * Set connection to read-only to help database optimize queries * Update JDBC drivers (MySQL has fixes for streaming results)	2016-04-21 05:40:07 +08:00
Gian Merlino	08c784fbf6	KafkaIndexTask: Use a separate sequence per Kafka partition in order to make (#2844 ) segment creation deterministic. This means that each segment will contain data from just one Kafka partition. So, users will probably not want to have a super high number of Kafka partitions... Fixes #2703.	2016-04-18 22:29:52 -07:00
Xavier Léauté	0f8a037bcd	support PostgreSQL >= 9.5 upsert capability	2016-04-01 16:53:27 -07:00
Gian Merlino	977e867ad8	Downgrade geoip2, exclude com.google.http-client. Reverts "Update com.maxmind.geoip2 to 2.6.0" and exclude the google http client from com.maxmind.geoip2. This should satisfy the original need from #2646 (wanting to run Druid along with an upgraded com.google.http-client) while preventing Jackson conflicts pointed out in #2717. Fixes #2717. This reverts commit `21b7572533`.	2016-03-25 14:43:22 -07:00
Himanshu	f26e73d133	Merge pull request #2720 from gianm/druid-api Move druid-api into the druid repo.	2016-03-24 15:51:10 -05:00
Gian Merlino	7e7a886f65	Move druid-api into the druid repo. This is from druid-api-0.3.17, as of commit 51884f1d05d5512cacaf62cedfbb28c6ab2535cf in the druid-api repo.	2016-03-24 11:04:34 -07:00
Himanshu Gupta	4aead38130	fix SketchEstimate post aggregator's getComparator() and test changes to verify same	2016-03-24 10:11:06 -05:00
jon-wei	a59c9ee1b1	Support use of DimensionSchema class in DimensionsSpec	2016-03-21 13:12:04 -07:00
Gian Merlino	738dcd8cd9	Update version to 0.9.1-SNAPSHOT. Fixes #2462	2016-03-17 10:34:20 -07:00
Slim	cf342d8d3c	Merge pull request #2517 from b-slim/adding_lookup_snapshot_utility [QTL][Lookup] lookup module with the snapshot utility	2016-03-17 11:39:47 -05:00
Slim Bouguerra	0c86b29ef0	lookup module with the snapshot utility	2016-03-17 09:20:41 -05:00
Charles Allen	02805a74a1	Merge pull request #2648 from chtefi/master Ignore case when testing for table existence	2016-03-14 13:57:53 -07:00
Stéphane Derosiaux	416cb03687	Ignore case when testing for table existence	2016-03-13 11:17:30 +01:00
Gian Merlino	f22fb2c2cf	KafkaIndexTask. Reads a specific offset range from specific partitions, and can use dataSource metadata transactions to guarantee exactly-once ingestion. Each task has a finite lifecycle, so it is expected that some process will be supervising existing tasks and creating new ones when needed.	2016-03-10 18:41:43 -08:00
Gian Merlino	187569e702	DataSource metadata. Geared towards supporting transactional inserts of new segments. This involves an interface "DataSourceMetadata" that allows combining of partially specified metadata (useful for partitioned ingestion). DataSource metadata is stored in a new "dataSource" table.	2016-03-10 17:41:50 -08:00
Nishant	ba1185963b	Fix a bunch of dependencies * Eliminate exclusion groups from pull-deps * Only consider dependency nodes in pull-deps if they are not in the following scopes * provided * test * system * Fix a bunch of `<scope>provided</scope>` missing tags * Better exclusions for a couple of problematic libs	2016-03-10 10:18:08 -08:00
fjy	e3e932a4d4	refactor extensions into core and contrib	2016-03-08 17:12:09 -08:00

... 18 19 20 21 22 ...

1352 Commits