Himanshu Gupta
f99bad7988
reformat datasketches module to satisfy druid style guidelines
2015-11-19 01:07:03 -06:00
Himanshu Gupta
fde9df2720
update to sketches-core-0.2.2 .
...
adds support for "cardinality" aggregator.
do not create sketch per event at ingestion time to make realtime ingestion faster
2015-11-19 01:05:59 -06:00
Fangjin Yang
21c84b5ff7
Merge pull request #1896 from gianm/allocate-segment
...
SegmentAllocateAction (fixes #1515 )
2015-11-18 21:05:46 -08:00
Xavier Léauté
ba41f37ce1
fix #1701 - MySQL 5.7 defaults break database character set check
2015-11-17 15:51:58 -08:00
Fangjin Yang
148153b47c
Merge pull request #1897 from himanshug/new_sketch_aggregation
...
complex aggregator based on http://datasketches.github.io
2015-11-12 09:01:01 -08:00
Himanshu Gupta
338f88b86b
further simplifying the api, users just need to use thetaSketch as aggregator
2015-11-12 00:04:34 -06:00
Himanshu Gupta
88ae3c43f9
changing names to be explicit about theta sketch algorithm
...
old names are still valid though so as to be backwards compatible for now
2015-11-12 00:04:34 -06:00
Himanshu Gupta
817cf41f5c
druid aggregators based on datasketches lib http://datasketches.github.io/
2015-11-12 00:04:33 -06:00
Gian Merlino
e4e5f0375b
SegmentAllocateAction ( fixes #1515 )
...
This is a feature meant to allow realtime tasks to work without being told upfront
what shardSpec they should use (so we can potentially publish a variable number
of segments per interval).
The idea is that there is a "pendingSegments" table in the metadata store that
tracks allocated segments. Each one has a segment id (the same segment id we know
and love) and is also part of a sequence.
The sequences are an idea from @cheddar that offers a way of doing replication.
If there are N tasks reading exactly the same data with exactly the same logic
(think Kafka tasks reading a fixed range of offsets) then you can place them
in the same sequence, and they will generate the same sequence of segments.
2015-11-11 16:54:35 -08:00
Xavier Léauté
fa6142e217
cleanup and remove unused imports
2015-11-11 12:25:21 -08:00
Charles Allen
1df4baf489
Move Jackson Guice adapters into io.druid
...
* Removes access to protected methods in com.fasterxml
* Eliminates druid-common's use of foreign package com.fasterxml
2015-11-09 10:50:45 -08:00
Lou Marvin Caraig
c924f9fe56
Added cloudfiles-extensions in order to support Rackspace's cloudfiles as deep storage
2015-11-04 17:44:48 +01:00
Himanshu Gupta
e9cfb7f46f
refer to top level property for hadoop version instead of hardcoding 2.3.0
2015-10-26 15:51:48 -05:00
Xavier Léauté
e4ac78e43d
bump next snapshot to 0.9.0
2015-10-20 13:46:13 -07:00
Xavier Léauté
4c2c7a2c37
update version to 0.8.3
2015-10-14 21:40:55 -07:00
Gian Merlino
e3bb93e8c7
Revert "Merge pull request #1781 from dclim/nested-groupby-multiple-same-aggregator-fix-v2"
...
This reverts commit dae488b7c0
, reversing
changes made to 397be4b897
.
2015-10-01 00:05:59 -04:00
Fangjin Yang
dae488b7c0
Merge pull request #1781 from dclim/nested-groupby-multiple-same-aggregator-fix-v2
...
Fix failure in nested groupBy with multiple aggregators with same fie…
2015-09-30 22:28:34 -04:00
David Lim
70ae5ca922
Fix failure in nested groupBy with multiple aggregators with same fieldName
...
Version 2 - Throws an exception if an outer query references an
aggregator that doesn't exist in the inner query, and then uses the
inner query aggregator names to form the columns for the intermediate
incremental index.
Also deleted all the getRequiredColumns() methods which are no longer
being used.
We do something wacky by adding an aggregator factory for the post
aggregators when building the intermediate incremental index, otherwise
queries on post aggregate results fail because the data isn't in the
incremental index.
Closes #1419
2015-09-30 15:43:11 -06:00
Charles Allen
bc22d4ff6c
Cleanup kafka-extraction-namespace
...
Remove extra build defines in kafka-extraction-namespace's pom.xml
2015-09-30 11:33:04 -07:00
Xavier Léauté
8a21b4cae3
Merge pull request #1697 from metamx/betterMissingQTLLogging
...
Better logging of URIExtractionNamespace failures due to missing files
2015-09-15 15:29:27 -07:00
Charles Allen
f5ed6e885c
Merge pull request #1702 from himanshug/double_datasource_in_storage_dir
...
do not have dataSource twice in path to segment storage on hdfs
2015-09-15 14:00:35 -07:00
Fangjin Yang
34ef81572d
Merge pull request #1700 from himanshug/update_agg_test_helper
...
update indexing in the helper to use multiple persists and merge
2015-09-14 06:56:29 -07:00
Himanshu Gupta
b989a7054c
fix for "java.io.IOException: No FileSystem for scheme: hdfs" error
...
aka workaround for https://issues.apache.org/jira/browse/HDFS-8750
2015-09-11 15:35:59 -05:00
Himanshu Gupta
67aa3dc153
on HDFS store segments in "dataSource/interval/.." and not in "dataSource/dataSource/interval.."
2015-09-09 11:12:01 -05:00
Himanshu Gupta
5da58e48e0
use Rule based TemporaryFolder for cleanup of temp directory/files
2015-09-09 11:10:33 -05:00
Charles Allen
1977ac9c5d
Better logging of URIExtractionNamespace failures due to missing files
2015-09-08 13:33:32 -07:00
Charles Allen
0b8a3035c6
Better timing and locking in NamespaceExtractionCacheManagerExecutorsTest
2015-09-04 13:02:14 -07:00
Nishant
0096e6a0a0
Merge pull request #1658 from metamx/cleanupJDBCExtractionNamespaceTest
...
Hopefully add better timeouts and ordering to JDBCExtractionNamespaceTest
2015-09-02 23:49:49 +05:30
Xavier Léauté
82f9ecf56b
Merge pull request #1620 from metamx/longFriendlyQTL
...
Allow long values in the key or value fields for URIExtractionNamespace
2015-09-02 10:55:35 -07:00
cheddar
4f61b42f40
Merge pull request #1578 from b-slim/fix_extraction_filter_2
...
Fix UT and documentation to the extraction filter
2015-09-01 10:46:20 -07:00
Gian Merlino
940e1aa3eb
Replace funky imports with standard ones.
...
1) Lots of Guava imports were not coming from the actual Guava
2) junit.framework.Assert should be org.junit.Assert
2015-08-28 18:02:05 -07:00
Himanshu Gupta
2e0dd1d792
adding UTs and addressing review comments to
...
firehoseV2 addition to Realtime[Manager|Plumber],
essential segment metadata persist support,
kafka-simple-consumer-firehose extension patch
2015-08-27 20:50:46 -05:00
lvjq
2237a8cf0f
kafka 8 simple consumer firehose
2015-08-27 20:50:46 -05:00
Charles Allen
ac8e32b58e
Hopefully add better timeouts and ordering to JDBCExtractionNamespaceTest
2015-08-26 23:05:51 -07:00
Charles Allen
b24a88b328
Allow long values in the key or value fields for URIExtractionNamespace
2015-08-26 09:44:03 -07:00
Fangjin Yang
33b862166a
Merge pull request #1659 from himanshug/segment_kill_update
...
on kill segment, dont leave version, interval and dataSource dir behind on HDFS
2015-08-26 07:23:20 -07:00
Xavier Léauté
c4d0e8d29b
remove unnecessary pom verbiage
2015-08-25 16:07:03 -07:00
Gian Merlino
2bf9a70bfa
Consolidate SQL retrying by moving logic into the connectors.
...
Also change boolean removeLock to void addLock in MetadataStorageActionHandler.
2015-08-25 12:42:29 -07:00
Himanshu Gupta
5b5a76ef6c
adding unit test for HdfsDataSegmentKiller.testKill(..)
2015-08-23 22:21:03 -05:00
Himanshu Gupta
c2bebfe39e
delete version, interval, dataSource directories on segment deletion if possible, so that they are not left behind and consume ns quota on HDFS
2015-08-23 22:06:12 -05:00
Himanshu Gupta
9b54124cd0
pseudo integration tests for approximate histogram
2015-08-20 01:27:20 -05:00
Xavier Léauté
1abcd75696
Merge pull request #1624 from metamx/expandTimeouts
...
Expand timeouts on JDBCExtractionNamespaceTest
2015-08-18 21:32:50 -07:00
Xavier Léauté
3b2e41e42a
update for next release
2015-08-18 17:16:46 -07:00
Charles Allen
38110820c3
Expand timeouts on JDBCExtractionNamespaceTest
2015-08-18 14:28:40 -07:00
Charles Allen
db19d2d547
Revert "Update to guice 4.0"
2015-08-14 09:26:07 -07:00
Charles Allen
76fbb12959
Increase timeout in tests for NamespaceExtractionCacheManagerExecutorsTest
2015-08-11 13:54:54 -07:00
Charles Allen
7e61216287
Update to guice 4.0
...
- Mark a lot of `@Provides` methods as final since guice 4.0 disallows overriding them
2015-08-10 13:57:18 -07:00
Charles Allen
8be82c00bd
Better handling of slow stuff in NamespaceExtractionCacheManagerExecutorsTest
2015-08-07 15:11:54 -07:00
Charles Allen
e6226968a6
Merge pull request #1589 from druid-io/fix-firehose-doc
...
Add a lot more docs for firehoses
2015-08-06 12:45:24 -07:00
Charles Allen
8cdcf69714
Better handle timeouts in namespace tests
2015-08-06 10:20:18 -07:00