Himanshu Gupta
338f88b86b
further simplifying the api, users just need to use thetaSketch as aggregator
2015-11-12 00:04:34 -06:00
Himanshu Gupta
88ae3c43f9
changing names to be explicit about theta sketch algorithm
...
old names are still valid though so as to be backwards compatible for now
2015-11-12 00:04:34 -06:00
Himanshu Gupta
817cf41f5c
druid aggregators based on datasketches lib http://datasketches.github.io/
2015-11-12 00:04:33 -06:00
Gian Merlino
e4e5f0375b
SegmentAllocateAction ( fixes #1515 )
...
This is a feature meant to allow realtime tasks to work without being told upfront
what shardSpec they should use (so we can potentially publish a variable number
of segments per interval).
The idea is that there is a "pendingSegments" table in the metadata store that
tracks allocated segments. Each one has a segment id (the same segment id we know
and love) and is also part of a sequence.
The sequences are an idea from @cheddar that offers a way of doing replication.
If there are N tasks reading exactly the same data with exactly the same logic
(think Kafka tasks reading a fixed range of offsets) then you can place them
in the same sequence, and they will generate the same sequence of segments.
2015-11-11 16:54:35 -08:00
Xavier Léauté
fa6142e217
cleanup and remove unused imports
2015-11-11 12:25:21 -08:00
Charles Allen
abae47850a
Add backwards compatability for PR #1922
2015-11-11 10:27:00 -08:00
Charles Allen
1df4baf489
Move Jackson Guice adapters into io.druid
...
* Removes access to protected methods in com.fasterxml
* Eliminates druid-common's use of foreign package com.fasterxml
2015-11-09 10:50:45 -08:00
Charles Allen
929b981710
Change DefaultObjectMapper to NOT overwrite final fields unless explicitly asked to
2015-11-05 18:10:13 -08:00
Lou Marvin Caraig
c924f9fe56
Added cloudfiles-extensions in order to support Rackspace's cloudfiles as deep storage
2015-11-04 17:44:48 +01:00
Himanshu Gupta
e9cfb7f46f
refer to top level property for hadoop version instead of hardcoding 2.3.0
2015-10-26 15:51:48 -05:00
Xavier Léauté
e4ac78e43d
bump next snapshot to 0.9.0
2015-10-20 13:46:13 -07:00
Xavier Léauté
4c2c7a2c37
update version to 0.8.3
2015-10-14 21:40:55 -07:00
Gian Merlino
e3bb93e8c7
Revert "Merge pull request #1781 from dclim/nested-groupby-multiple-same-aggregator-fix-v2"
...
This reverts commit dae488b7c0
, reversing
changes made to 397be4b897
.
2015-10-01 00:05:59 -04:00
Fangjin Yang
dae488b7c0
Merge pull request #1781 from dclim/nested-groupby-multiple-same-aggregator-fix-v2
...
Fix failure in nested groupBy with multiple aggregators with same fie…
2015-09-30 22:28:34 -04:00
David Lim
70ae5ca922
Fix failure in nested groupBy with multiple aggregators with same fieldName
...
Version 2 - Throws an exception if an outer query references an
aggregator that doesn't exist in the inner query, and then uses the
inner query aggregator names to form the columns for the intermediate
incremental index.
Also deleted all the getRequiredColumns() methods which are no longer
being used.
We do something wacky by adding an aggregator factory for the post
aggregators when building the intermediate incremental index, otherwise
queries on post aggregate results fail because the data isn't in the
incremental index.
Closes #1419
2015-09-30 15:43:11 -06:00
Charles Allen
bc22d4ff6c
Cleanup kafka-extraction-namespace
...
Remove extra build defines in kafka-extraction-namespace's pom.xml
2015-09-30 11:33:04 -07:00
Xavier Léauté
8a21b4cae3
Merge pull request #1697 from metamx/betterMissingQTLLogging
...
Better logging of URIExtractionNamespace failures due to missing files
2015-09-15 15:29:27 -07:00
Charles Allen
f5ed6e885c
Merge pull request #1702 from himanshug/double_datasource_in_storage_dir
...
do not have dataSource twice in path to segment storage on hdfs
2015-09-15 14:00:35 -07:00
Fangjin Yang
34ef81572d
Merge pull request #1700 from himanshug/update_agg_test_helper
...
update indexing in the helper to use multiple persists and merge
2015-09-14 06:56:29 -07:00
Himanshu Gupta
b989a7054c
fix for "java.io.IOException: No FileSystem for scheme: hdfs" error
...
aka workaround for https://issues.apache.org/jira/browse/HDFS-8750
2015-09-11 15:35:59 -05:00
Himanshu Gupta
67aa3dc153
on HDFS store segments in "dataSource/interval/.." and not in "dataSource/dataSource/interval.."
2015-09-09 11:12:01 -05:00
Himanshu Gupta
5da58e48e0
use Rule based TemporaryFolder for cleanup of temp directory/files
2015-09-09 11:10:33 -05:00
Charles Allen
1977ac9c5d
Better logging of URIExtractionNamespace failures due to missing files
2015-09-08 13:33:32 -07:00
Charles Allen
0b8a3035c6
Better timing and locking in NamespaceExtractionCacheManagerExecutorsTest
2015-09-04 13:02:14 -07:00
Nishant
0096e6a0a0
Merge pull request #1658 from metamx/cleanupJDBCExtractionNamespaceTest
...
Hopefully add better timeouts and ordering to JDBCExtractionNamespaceTest
2015-09-02 23:49:49 +05:30
Xavier Léauté
82f9ecf56b
Merge pull request #1620 from metamx/longFriendlyQTL
...
Allow long values in the key or value fields for URIExtractionNamespace
2015-09-02 10:55:35 -07:00
cheddar
4f61b42f40
Merge pull request #1578 from b-slim/fix_extraction_filter_2
...
Fix UT and documentation to the extraction filter
2015-09-01 10:46:20 -07:00
Gian Merlino
940e1aa3eb
Replace funky imports with standard ones.
...
1) Lots of Guava imports were not coming from the actual Guava
2) junit.framework.Assert should be org.junit.Assert
2015-08-28 18:02:05 -07:00
Himanshu Gupta
2e0dd1d792
adding UTs and addressing review comments to
...
firehoseV2 addition to Realtime[Manager|Plumber],
essential segment metadata persist support,
kafka-simple-consumer-firehose extension patch
2015-08-27 20:50:46 -05:00
lvjq
2237a8cf0f
kafka 8 simple consumer firehose
2015-08-27 20:50:46 -05:00
Charles Allen
ac8e32b58e
Hopefully add better timeouts and ordering to JDBCExtractionNamespaceTest
2015-08-26 23:05:51 -07:00
Charles Allen
b24a88b328
Allow long values in the key or value fields for URIExtractionNamespace
2015-08-26 09:44:03 -07:00
Fangjin Yang
33b862166a
Merge pull request #1659 from himanshug/segment_kill_update
...
on kill segment, dont leave version, interval and dataSource dir behind on HDFS
2015-08-26 07:23:20 -07:00
Xavier Léauté
c4d0e8d29b
remove unnecessary pom verbiage
2015-08-25 16:07:03 -07:00
Gian Merlino
2bf9a70bfa
Consolidate SQL retrying by moving logic into the connectors.
...
Also change boolean removeLock to void addLock in MetadataStorageActionHandler.
2015-08-25 12:42:29 -07:00
Himanshu Gupta
5b5a76ef6c
adding unit test for HdfsDataSegmentKiller.testKill(..)
2015-08-23 22:21:03 -05:00
Himanshu Gupta
c2bebfe39e
delete version, interval, dataSource directories on segment deletion if possible, so that they are not left behind and consume ns quota on HDFS
2015-08-23 22:06:12 -05:00
Himanshu Gupta
9b54124cd0
pseudo integration tests for approximate histogram
2015-08-20 01:27:20 -05:00
Xavier Léauté
1abcd75696
Merge pull request #1624 from metamx/expandTimeouts
...
Expand timeouts on JDBCExtractionNamespaceTest
2015-08-18 21:32:50 -07:00
Xavier Léauté
3b2e41e42a
update for next release
2015-08-18 17:16:46 -07:00
Charles Allen
38110820c3
Expand timeouts on JDBCExtractionNamespaceTest
2015-08-18 14:28:40 -07:00
Charles Allen
db19d2d547
Revert "Update to guice 4.0"
2015-08-14 09:26:07 -07:00
Charles Allen
76fbb12959
Increase timeout in tests for NamespaceExtractionCacheManagerExecutorsTest
2015-08-11 13:54:54 -07:00
Charles Allen
7e61216287
Update to guice 4.0
...
- Mark a lot of `@Provides` methods as final since guice 4.0 disallows overriding them
2015-08-10 13:57:18 -07:00
Charles Allen
8be82c00bd
Better handling of slow stuff in NamespaceExtractionCacheManagerExecutorsTest
2015-08-07 15:11:54 -07:00
Charles Allen
e6226968a6
Merge pull request #1589 from druid-io/fix-firehose-doc
...
Add a lot more docs for firehoses
2015-08-06 12:45:24 -07:00
Charles Allen
8cdcf69714
Better handle timeouts in namespace tests
2015-08-06 10:20:18 -07:00
fjy
012fff6616
fix firehose docs
2015-08-04 09:52:23 -07:00
Slim Bouguerra
7848429cbf
unused imports
2015-08-03 14:50:52 -05:00
Fangjin Yang
22567946cf
Merge pull request #1259 from metamx/queryTimeLookup
...
Query Time Lookup
2015-07-28 11:43:05 -10:00
Himanshu
cc50217eb0
Merge pull request #1568 from metamx/detailedSegmentLoadingErrors
...
More detailed error logging on segment activities
2015-07-28 13:31:16 -05:00
Charles Allen
86ede702b1
Add namespaced lookups as extensions
...
* Adds kafka, URI, and JDBC namespace defintions
* Add ability to explicitly rename using a "namespace" which is a particular data collection that is loaded on all realtime, historic nodes, and brokers. If any of these nodes has the namespace extension, ALL nodes have the namespace extension.
* Add namespace caching and populating (can be on heap or off heap)
* Add NamespaceExtractionCacheManager for handling caches
* Added ExtractionNamespace for handling metadata on the extraction namespaces
* Added ExtractionNamespaceUpdate for handling metadata related to updates
* Add extension which caches renames from a kafka stream (requires kafka8)
* Added README.md for the namespace kafka extension
* Added docs
* Added namespace/size, namespace/count, namespace/deltaTasksStarted metrics
Add static config for namespaces via `druid.query.extraction.namespace`
* This is a rebase of https://github.com/b-slim/druid/tree/static_config_only
2015-07-28 11:14:14 -07:00
Charles Allen
c492d4448d
More detailed S3DataSegmentKiller error messages
2015-07-27 13:45:03 -07:00
Charles Allen
fe7818ddd2
More detailed AzureDataSegmentKiller error messgaes
2015-07-27 13:44:59 -07:00
Charles Allen
3f901e7291
More detailed logging of error message on S3DataSegmentMover
2015-07-27 13:28:54 -07:00
Charles Allen
e051e93d19
Merge pull request #1518 from RealROI/more-azure-features
...
Azure Blob Store support for Firehose and Indexing Service Logs
2015-07-17 16:10:22 -07:00
Zak Kristjanson
0bda7af52c
Add more support for Azure Blob Store
...
Azure Blob Store support for Task Logs and a firehose for data ingestion
2015-07-17 15:38:21 -07:00
Xavier Léauté
4cfb00bc8a
inrement version
2015-07-15 13:09:05 -07:00
Hao Xia
1931491c9f
A couple of hdfs related fixes
...
* Class loading issue with hdfs-storage extension
* Exception when using hdfs with non-fully qualified segment path
2015-06-19 17:22:20 -07:00
Xavier Léauté
0a5bb909a2
[maven-release-plugin] prepare for next development iteration
2015-06-18 17:35:19 -07:00
Xavier Léauté
59c6b2b279
[maven-release-plugin] prepare release druid-0.8.0-rc1
2015-06-18 17:35:14 -07:00
Charles Allen
f48db09e35
Add optimizations for ExtractionFn by enabling MANY_TO_ONE vs ONE_TO_ONE codepaths
...
* Also adds LookupExtractionFn and MapLookupExtractor which takes in an explicit mapping of renames
* Add injective to javascript extraction fn
2015-06-02 12:22:56 -07:00
fjy
be2a35188e
Additional schema validations and better logs for common extensions
2015-05-27 16:25:02 -07:00
cheddar
c1b1752595
Merge pull request #1383 from metamx/psql-transient
...
retry transient exceptions for PostgreSQL, fixes #1382
2015-05-22 13:01:53 -07:00
Xavier Léauté
6b23e02d2b
retry transient exceptions for PostgreSQL, fixes #1382
2015-05-22 14:47:27 -04:00
flow
07659f30ab
bug fix: hdfs task log and indexing task not work properly with Hadoop HA
2015-05-21 20:49:42 +08:00
Xavier Léauté
3c3db7229c
Merge pull request #1355 from himanshug/long_max_min_aggregators
...
Long max/min aggregators
2015-05-13 12:08:11 -07:00
Himanshu Gupta
d0ec945129
adding aliases doubleMax and doubleMin for max and min respectively
...
renamed all [Max/Min]*.java to [DoubleMax/DoubleMin]*.java and created [Max/Min]AggregatorFactory.java which can be removed when we dont need the min/max aggregator type backward compatibility
2015-05-13 09:25:41 -05:00
fjy
7a6acf5c1b
update pom to 0.8
2015-05-11 19:41:58 -06:00
David Rodrigues
11a76169b4
Overall improvement on Azure Deep Storage extension.
...
* Remove hard-coded azure path manipulation from the puller.
* Fix segment size not being zero after uploading it do Azure.
* Remove both index and desc files only on a success upload to Azure.
* Add Azure container name to load spec.
This patch would help future-proof azure deep-storage module and avoid
having to introduce ugly backwards-compatibility fixes when we want to
support multiple containers or moving data between containers.
2015-05-05 15:17:25 -07:00
Charles Allen
16a0c40d4c
Fix concatenated gzip files in StaticS3FirehoseFactory
2015-04-24 15:06:28 -07:00
David Pinheiro
baeef08c4c
Add Microsoft Azure as a Deep Storage option.
2015-04-16 15:39:36 -07:00
Charles Allen
abdeaa0746
Add stricter checking for potential coding errors
...
Can use via `mvn clean compile test-compile -P strict'
2015-04-15 14:52:25 -07:00
Charles Allen
b29816bddb
Minor fix in hdfs-storage pom.xml
2015-04-08 14:29:16 -07:00
Fangjin Yang
208e307915
Merge pull request #1251 from metamx/uriSegmentLoaders
...
Revert "Revert "Overhaul of SegmentPullers to add consistency and retries""
2015-03-30 17:43:51 -07:00
fjy
aea7f9d192
[maven-release-plugin] prepare for next development iteration
2015-03-30 16:35:24 -07:00
fjy
060d7aef03
[maven-release-plugin] prepare release druid-0.7.1
2015-03-30 16:35:20 -07:00
Charles Allen
1c6cbea89c
Revert "Revert "Overhaul of SegmentPullers to add consistency and retries""
...
This reverts commit f904bc7858
.
2015-03-30 13:40:04 -07:00
Fangjin Yang
f904bc7858
Revert "Overhaul of SegmentPullers to add consistency and retries"
2015-03-30 13:15:50 -07:00
Charles Allen
6d407e8677
Add URI handling to SegmentPullers
...
* Requires https://github.com/druid-io/druid-api/pull/37
* Requires https://github.com/metamx/java-util/pull/22
* Moves the puller logic to use a more standard workflow going through java-util helpers instead of re-writing the handlers for each impl
* General workflow goes like this: 1) LoadSpec makes sure the correct Puller is called with the correct parameters. 2) The Puller sets up general information like how to make an InputStream, how to find a file name (for .gz files for example), and when to retry. 3) CompressionUtils does most of the heavy lifting when it can
2015-03-30 12:33:23 -07:00
Prajwal Tuladhar
fb7005435b
use ByteSink and ByteSource instead of OutputSupplier and InputSupplier
...
They are being deprecated and will eventually be removed in Guava 18.0
2015-03-26 14:45:00 -04:00
Charles Allen
3ed4b19201
Update mysql-connector-java to 5.1.34
2015-03-23 15:43:34 -07:00
fjy
b389cfe404
[maven-release-plugin] prepare for next development iteration
2015-03-19 12:38:17 -07:00
fjy
60e7d543cc
[maven-release-plugin] prepare release druid-0.7.1-rc1
2015-03-19 12:38:13 -07:00
fjy
6a47c1530c
update versions to prepare for rc release
2015-03-19 11:39:38 -07:00
Xavier Léauté
11b3230602
update to kafka 0.8.2.1, because it's better™
2015-03-12 09:59:24 -07:00
Xavier Léauté
217e674063
Handling aggregators and post aggregators with duplicate names
...
* add test for same-name groupBy hyperUniques post-agg
* add test for same-name post-agg in groupby with approx histogram
* Fixes https://github.com/druid-io/druid/issues/1045
* Throws an error if post aggs and aggs do not have unique names
* Add more groupBy tests for Having filters
2015-03-10 17:10:43 -07:00
Fangjin Yang
e8605c63a9
Merge pull request #1150 from himanshug/broker-parallel-chunk-process
...
interval chunk query runner now processes individual chunk in a threadpool
2015-03-02 13:50:23 -08:00
Himanshu Gupta
29039fd541
interval chunk query runner now processes individual chunk in a thread pool and prints metrics query/time per chunk
2015-03-02 15:45:09 -06:00
Xavier Léauté
b167dcf82c
[maven-release-plugin] prepare for next development iteration
2015-02-23 14:28:06 -08:00
Xavier Léauté
e81ac2ba43
[maven-release-plugin] prepare release druid-0.7.0
2015-02-23 14:27:58 -08:00
Xavier Léauté
38e8dfdc98
replace Kafka 0.8.1.1 with 0.8.2.0 stable
2015-02-13 14:48:36 -08:00
Xavier Léauté
1971c1679c
do not build kafka-seven extension by default
2015-02-13 14:32:47 -08:00
Xavier Léauté
78df7f6165
Move Druid release artifacts to Sonatype
...
- Switch to using Druid parent POM
- Add required fields for Sonatype
- Common plugin versions and settings have been moved to the parent pom
- Cleanup artifacts and POMs for consistent formatting
- Remove org.hyperic.sigar dependency and update docs to reflect necessary jars to add at runtime when sigar is needed
2015-02-13 14:26:31 -08:00
fjy
d29740ed9f
[maven-release-plugin] prepare for next development iteration
2015-02-12 16:16:00 -08:00
fjy
211fd15b7e
[maven-release-plugin] prepare release druid-0.7.0-rc3
2015-02-12 16:15:56 -08:00
fjy
1f12c5b2f1
[maven-release-plugin] prepare for next development iteration
2015-02-03 12:06:49 -08:00
fjy
e82d431be7
[maven-release-plugin] prepare release druid-0.7.0-rc2
2015-02-03 12:06:41 -08:00
Fangjin Yang
92e616de11
Merge pull request #1077 from metamx/remove-unused-imports
...
remove unused imports
2015-02-02 10:45:27 -08:00
nishantmonu51
ba932bb1f2
remove unused imports
2015-02-02 21:53:39 +05:30