Commit Graph

159 Commits

Author SHA1 Message Date
Xavier Léauté 8a21b4cae3 Merge pull request #1697 from metamx/betterMissingQTLLogging
Better logging of URIExtractionNamespace failures due to missing files
2015-09-15 15:29:27 -07:00
Charles Allen f5ed6e885c Merge pull request #1702 from himanshug/double_datasource_in_storage_dir
do not have dataSource twice in path to segment storage on hdfs
2015-09-15 14:00:35 -07:00
Fangjin Yang 34ef81572d Merge pull request #1700 from himanshug/update_agg_test_helper
update indexing in the helper to use multiple persists and merge
2015-09-14 06:56:29 -07:00
Himanshu Gupta b989a7054c fix for "java.io.IOException: No FileSystem for scheme: hdfs" error
aka workaround for https://issues.apache.org/jira/browse/HDFS-8750
2015-09-11 15:35:59 -05:00
Himanshu Gupta 67aa3dc153 on HDFS store segments in "dataSource/interval/.." and not in "dataSource/dataSource/interval.." 2015-09-09 11:12:01 -05:00
Himanshu Gupta 5da58e48e0 use Rule based TemporaryFolder for cleanup of temp directory/files 2015-09-09 11:10:33 -05:00
Charles Allen 1977ac9c5d Better logging of URIExtractionNamespace failures due to missing files 2015-09-08 13:33:32 -07:00
Charles Allen 0b8a3035c6 Better timing and locking in NamespaceExtractionCacheManagerExecutorsTest 2015-09-04 13:02:14 -07:00
Nishant 0096e6a0a0 Merge pull request #1658 from metamx/cleanupJDBCExtractionNamespaceTest
Hopefully add better timeouts and ordering to JDBCExtractionNamespaceTest
2015-09-02 23:49:49 +05:30
Xavier Léauté 82f9ecf56b Merge pull request #1620 from metamx/longFriendlyQTL
Allow long values in the key or value fields for URIExtractionNamespace
2015-09-02 10:55:35 -07:00
cheddar 4f61b42f40 Merge pull request #1578 from b-slim/fix_extraction_filter_2
Fix UT and documentation to the extraction filter
2015-09-01 10:46:20 -07:00
Gian Merlino 940e1aa3eb Replace funky imports with standard ones.
1) Lots of Guava imports were not coming from the actual Guava
2) junit.framework.Assert should be org.junit.Assert
2015-08-28 18:02:05 -07:00
Himanshu Gupta 2e0dd1d792 adding UTs and addressing review comments to
firehoseV2 addition to Realtime[Manager|Plumber],
essential segment metadata persist support,
kafka-simple-consumer-firehose extension patch
2015-08-27 20:50:46 -05:00
lvjq 2237a8cf0f kafka 8 simple consumer firehose 2015-08-27 20:50:46 -05:00
Charles Allen ac8e32b58e Hopefully add better timeouts and ordering to JDBCExtractionNamespaceTest 2015-08-26 23:05:51 -07:00
Charles Allen b24a88b328 Allow long values in the key or value fields for URIExtractionNamespace 2015-08-26 09:44:03 -07:00
Fangjin Yang 33b862166a Merge pull request #1659 from himanshug/segment_kill_update
on kill segment, dont leave version, interval and dataSource dir behind on HDFS
2015-08-26 07:23:20 -07:00
Xavier Léauté c4d0e8d29b remove unnecessary pom verbiage 2015-08-25 16:07:03 -07:00
Gian Merlino 2bf9a70bfa Consolidate SQL retrying by moving logic into the connectors.
Also change boolean removeLock to void addLock in MetadataStorageActionHandler.
2015-08-25 12:42:29 -07:00
Himanshu Gupta 5b5a76ef6c adding unit test for HdfsDataSegmentKiller.testKill(..) 2015-08-23 22:21:03 -05:00
Himanshu Gupta c2bebfe39e delete version, interval, dataSource directories on segment deletion if possible, so that they are not left behind and consume ns quota on HDFS 2015-08-23 22:06:12 -05:00
Himanshu Gupta 9b54124cd0 pseudo integration tests for approximate histogram 2015-08-20 01:27:20 -05:00
Xavier Léauté 1abcd75696 Merge pull request #1624 from metamx/expandTimeouts
Expand timeouts on JDBCExtractionNamespaceTest
2015-08-18 21:32:50 -07:00
Xavier Léauté 3b2e41e42a update for next release 2015-08-18 17:16:46 -07:00
Charles Allen 38110820c3 Expand timeouts on JDBCExtractionNamespaceTest 2015-08-18 14:28:40 -07:00
Charles Allen db19d2d547 Revert "Update to guice 4.0" 2015-08-14 09:26:07 -07:00
Charles Allen 76fbb12959 Increase timeout in tests for NamespaceExtractionCacheManagerExecutorsTest 2015-08-11 13:54:54 -07:00
Charles Allen 7e61216287 Update to guice 4.0
- Mark a lot of `@Provides` methods as final since guice 4.0 disallows overriding them
2015-08-10 13:57:18 -07:00
Charles Allen 8be82c00bd Better handling of slow stuff in NamespaceExtractionCacheManagerExecutorsTest 2015-08-07 15:11:54 -07:00
Charles Allen e6226968a6 Merge pull request #1589 from druid-io/fix-firehose-doc
Add a lot more docs for firehoses
2015-08-06 12:45:24 -07:00
Charles Allen 8cdcf69714 Better handle timeouts in namespace tests 2015-08-06 10:20:18 -07:00
fjy 012fff6616 fix firehose docs 2015-08-04 09:52:23 -07:00
Slim Bouguerra 7848429cbf unused imports 2015-08-03 14:50:52 -05:00
Fangjin Yang 22567946cf Merge pull request #1259 from metamx/queryTimeLookup
Query Time Lookup
2015-07-28 11:43:05 -10:00
Himanshu cc50217eb0 Merge pull request #1568 from metamx/detailedSegmentLoadingErrors
More detailed error logging on segment activities
2015-07-28 13:31:16 -05:00
Charles Allen 86ede702b1 Add namespaced lookups as extensions
* Adds kafka, URI, and JDBC namespace defintions
* Add ability to explicitly rename using a "namespace" which is a particular data collection that is loaded on all realtime, historic nodes, and brokers. If any of these nodes has the namespace extension, ALL nodes have the namespace extension.
* Add namespace caching and populating (can be on heap or off heap)
* Add NamespaceExtractionCacheManager for handling caches
* Added ExtractionNamespace for handling metadata on the extraction namespaces
* Added ExtractionNamespaceUpdate for handling metadata related to updates
* Add extension which caches renames from a kafka stream (requires kafka8)
* Added README.md for the namespace kafka extension
* Added docs
* Added namespace/size, namespace/count, namespace/deltaTasksStarted metrics

Add static config for namespaces via `druid.query.extraction.namespace`
* This is a rebase of https://github.com/b-slim/druid/tree/static_config_only
2015-07-28 11:14:14 -07:00
Charles Allen c492d4448d More detailed S3DataSegmentKiller error messages 2015-07-27 13:45:03 -07:00
Charles Allen fe7818ddd2 More detailed AzureDataSegmentKiller error messgaes 2015-07-27 13:44:59 -07:00
Charles Allen 3f901e7291 More detailed logging of error message on S3DataSegmentMover 2015-07-27 13:28:54 -07:00
Charles Allen e051e93d19 Merge pull request #1518 from RealROI/more-azure-features
Azure Blob Store support for Firehose and Indexing Service Logs
2015-07-17 16:10:22 -07:00
Zak Kristjanson 0bda7af52c Add more support for Azure Blob Store
Azure Blob Store support for Task Logs and a firehose for data ingestion
2015-07-17 15:38:21 -07:00
Xavier Léauté 4cfb00bc8a inrement version 2015-07-15 13:09:05 -07:00
Hao Xia 1931491c9f A couple of hdfs related fixes
* Class loading issue with hdfs-storage extension
* Exception when using hdfs with non-fully qualified segment path
2015-06-19 17:22:20 -07:00
Xavier Léauté 0a5bb909a2 [maven-release-plugin] prepare for next development iteration 2015-06-18 17:35:19 -07:00
Xavier Léauté 59c6b2b279 [maven-release-plugin] prepare release druid-0.8.0-rc1 2015-06-18 17:35:14 -07:00
Charles Allen f48db09e35 Add optimizations for ExtractionFn by enabling MANY_TO_ONE vs ONE_TO_ONE codepaths
* Also adds LookupExtractionFn and MapLookupExtractor which takes in an explicit mapping of renames
* Add injective to javascript extraction fn
2015-06-02 12:22:56 -07:00
fjy be2a35188e Additional schema validations and better logs for common extensions 2015-05-27 16:25:02 -07:00
cheddar c1b1752595 Merge pull request #1383 from metamx/psql-transient
retry transient exceptions for PostgreSQL, fixes #1382
2015-05-22 13:01:53 -07:00
Xavier Léauté 6b23e02d2b retry transient exceptions for PostgreSQL, fixes #1382 2015-05-22 14:47:27 -04:00
flow 07659f30ab bug fix: hdfs task log and indexing task not work properly with Hadoop HA 2015-05-21 20:49:42 +08:00
Xavier Léauté 3c3db7229c Merge pull request #1355 from himanshug/long_max_min_aggregators
Long max/min aggregators
2015-05-13 12:08:11 -07:00
Himanshu Gupta d0ec945129 adding aliases doubleMax and doubleMin for max and min respectively
renamed all [Max/Min]*.java to [DoubleMax/DoubleMin]*.java and created [Max/Min]AggregatorFactory.java which can be removed when we dont need the min/max aggregator type backward compatibility
2015-05-13 09:25:41 -05:00
fjy 7a6acf5c1b update pom to 0.8 2015-05-11 19:41:58 -06:00
David Rodrigues 11a76169b4 Overall improvement on Azure Deep Storage extension.
* Remove hard-coded azure path manipulation from the puller.
  * Fix segment size not being zero after uploading it do Azure.
  * Remove both index and desc files only on a success upload to Azure.
  * Add Azure container name to load spec.
      This patch would help future-proof azure deep-storage module and avoid
      having to introduce ugly backwards-compatibility fixes when we want to
      support multiple containers or moving data between containers.
2015-05-05 15:17:25 -07:00
Charles Allen 16a0c40d4c Fix concatenated gzip files in StaticS3FirehoseFactory 2015-04-24 15:06:28 -07:00
David Pinheiro baeef08c4c Add Microsoft Azure as a Deep Storage option. 2015-04-16 15:39:36 -07:00
Charles Allen abdeaa0746 Add stricter checking for potential coding errors
Can use via `mvn clean compile test-compile -P strict'
2015-04-15 14:52:25 -07:00
Charles Allen b29816bddb Minor fix in hdfs-storage pom.xml 2015-04-08 14:29:16 -07:00
Fangjin Yang 208e307915 Merge pull request #1251 from metamx/uriSegmentLoaders
Revert "Revert "Overhaul of SegmentPullers to add consistency and retries""
2015-03-30 17:43:51 -07:00
fjy aea7f9d192 [maven-release-plugin] prepare for next development iteration 2015-03-30 16:35:24 -07:00
fjy 060d7aef03 [maven-release-plugin] prepare release druid-0.7.1 2015-03-30 16:35:20 -07:00
Charles Allen 1c6cbea89c Revert "Revert "Overhaul of SegmentPullers to add consistency and retries""
This reverts commit f904bc7858.
2015-03-30 13:40:04 -07:00
Fangjin Yang f904bc7858 Revert "Overhaul of SegmentPullers to add consistency and retries" 2015-03-30 13:15:50 -07:00
Charles Allen 6d407e8677 Add URI handling to SegmentPullers
* Requires https://github.com/druid-io/druid-api/pull/37
* Requires https://github.com/metamx/java-util/pull/22
* Moves the puller logic to use a more standard workflow going through java-util helpers instead of re-writing the handlers for each impl
  * General workflow goes like this: 1) LoadSpec makes sure the correct Puller is called with the correct parameters. 2) The Puller sets up general information like how to make an InputStream, how to find a file name (for .gz files for example), and when to retry. 3) CompressionUtils does most of the heavy lifting when it can
2015-03-30 12:33:23 -07:00
Prajwal Tuladhar fb7005435b use ByteSink and ByteSource instead of OutputSupplier and InputSupplier
They are being deprecated and will eventually be removed in Guava 18.0
2015-03-26 14:45:00 -04:00
Charles Allen 3ed4b19201 Update mysql-connector-java to 5.1.34 2015-03-23 15:43:34 -07:00
fjy b389cfe404 [maven-release-plugin] prepare for next development iteration 2015-03-19 12:38:17 -07:00
fjy 60e7d543cc [maven-release-plugin] prepare release druid-0.7.1-rc1 2015-03-19 12:38:13 -07:00
fjy 6a47c1530c update versions to prepare for rc release 2015-03-19 11:39:38 -07:00
Xavier Léauté 11b3230602 update to kafka 0.8.2.1, because it's better™ 2015-03-12 09:59:24 -07:00
Xavier Léauté 217e674063 Handling aggregators and post aggregators with duplicate names
* add test for same-name groupBy hyperUniques post-agg
* add test for same-name post-agg in groupby with approx histogram
* Fixes https://github.com/druid-io/druid/issues/1045
* Throws an error if post aggs and aggs do not have unique names
* Add more groupBy tests for Having filters
2015-03-10 17:10:43 -07:00
Fangjin Yang e8605c63a9 Merge pull request #1150 from himanshug/broker-parallel-chunk-process
interval chunk query runner now processes individual chunk in a threadpool
2015-03-02 13:50:23 -08:00
Himanshu Gupta 29039fd541 interval chunk query runner now processes individual chunk in a thread pool and prints metrics query/time per chunk 2015-03-02 15:45:09 -06:00
Xavier Léauté b167dcf82c [maven-release-plugin] prepare for next development iteration 2015-02-23 14:28:06 -08:00
Xavier Léauté e81ac2ba43 [maven-release-plugin] prepare release druid-0.7.0 2015-02-23 14:27:58 -08:00
Xavier Léauté 38e8dfdc98 replace Kafka 0.8.1.1 with 0.8.2.0 stable 2015-02-13 14:48:36 -08:00
Xavier Léauté 1971c1679c do not build kafka-seven extension by default 2015-02-13 14:32:47 -08:00
Xavier Léauté 78df7f6165 Move Druid release artifacts to Sonatype
- Switch to using Druid parent POM
- Add required fields for Sonatype
- Common plugin versions and settings have been moved to the parent pom
- Cleanup artifacts and POMs for consistent formatting
- Remove org.hyperic.sigar dependency and update docs to reflect necessary jars to add at runtime when sigar is needed
2015-02-13 14:26:31 -08:00
fjy d29740ed9f [maven-release-plugin] prepare for next development iteration 2015-02-12 16:16:00 -08:00
fjy 211fd15b7e [maven-release-plugin] prepare release druid-0.7.0-rc3 2015-02-12 16:15:56 -08:00
fjy 1f12c5b2f1 [maven-release-plugin] prepare for next development iteration 2015-02-03 12:06:49 -08:00
fjy e82d431be7 [maven-release-plugin] prepare release druid-0.7.0-rc2 2015-02-03 12:06:41 -08:00
Fangjin Yang 92e616de11 Merge pull request #1077 from metamx/remove-unused-imports
remove unused imports
2015-02-02 10:45:27 -08:00
nishantmonu51 ba932bb1f2 remove unused imports 2015-02-02 21:53:39 +05:30
fjy d05032b98a towards a community led druid 2015-01-31 20:57:36 -08:00
Xavier Léauté f00872c41b move common AWS related classes into a separate module 2015-01-29 13:55:49 -08:00
fjy 1f94de22c6 [maven-release-plugin] prepare for next development iteration 2015-01-20 14:23:55 -08:00
fjy 17476edc31 [maven-release-plugin] prepare release druid-0.7.0-rc1 2015-01-20 14:23:51 -08:00
Xavier Léauté c532d07635 Merge pull request #1011 from metamx/log4j2
Upgrade to log4j2
2015-01-20 12:51:07 -08:00
Nick Tate dd9b646037 Remove parser in the constructor of statics3 firehose as it is no longer needed with the new ingestion schemas 2015-01-20 12:45:58 -08:00
Charles Allen 3d27747f7e Upgrade to log4j2
Default behavior is as before.
Added documentation for how to enable synchronous logging for select chatty classes:
* io.druid.client.ServerInventoryView
* io.druid.client.BatchServerInventoryView
* io.druid.curator.inventory.CuratorInventoryManager
* com.metamx.http.client.pool.ChannelResourceFactory
2015-01-20 12:35:18 -08:00
Fangjin Yang 0ae737f383 Merge pull request #1039 from metamx/enforce-utf-8
ensure MySQL database defaults to UTF-8 on startup
2015-01-16 13:10:07 -08:00
Xavier Léauté c0de179aa0 ensure MySQL database defaults to UTF-8 on startup 2015-01-15 15:30:16 -08:00
Charles Allen b1b5c9099e Update all String conversions to and from byte[] to use the java-util StringUtils functions
* Speedup of GroupBy with javaScript filters by ~10%
* Requires https://github.com/metamx/java-util/pull/15
2015-01-05 11:22:32 -08:00
flow f13eab644a Keep HdfsTaskLogsConfig creator 2014-12-19 10:48:39 +08:00
flow a637a23eae fix issue #977 2014-12-18 19:09:13 +08:00
fjy 882874ce60 address cr 2014-12-16 11:37:38 -08:00
fjy b3999bbc6a update http client and fix log4j dependencies 2014-12-16 11:29:50 -08:00
Fangjin Yang d106096327 Revert "Revert "Support more AWS credentials providers for S3 storage"" 2014-12-15 19:27:28 -07:00
Fangjin Yang 714317a492 Revert "Support more AWS credentials providers for S3 storage" 2014-12-15 19:16:37 -07:00