Commit Graph

128 Commits

Author SHA1 Message Date
Himanshu Gupta f99bad7988 reformat datasketches module to satisfy druid style guidelines 2015-11-19 01:07:03 -06:00
Himanshu Gupta fde9df2720 update to sketches-core-0.2.2 .
adds support for "cardinality" aggregator.
do not create sketch per event at ingestion time to make realtime ingestion faster
2015-11-19 01:05:59 -06:00
Fangjin Yang 21c84b5ff7 Merge pull request #1896 from gianm/allocate-segment
SegmentAllocateAction (fixes #1515)
2015-11-18 21:05:46 -08:00
Xavier Léauté ba41f37ce1 fix #1701 - MySQL 5.7 defaults break database character set check 2015-11-17 15:51:58 -08:00
Fangjin Yang 148153b47c Merge pull request #1897 from himanshug/new_sketch_aggregation
complex aggregator based on http://datasketches.github.io
2015-11-12 09:01:01 -08:00
Himanshu Gupta 338f88b86b further simplifying the api, users just need to use thetaSketch as aggregator 2015-11-12 00:04:34 -06:00
Himanshu Gupta 88ae3c43f9 changing names to be explicit about theta sketch algorithm
old names are still valid though so as to be backwards compatible for now
2015-11-12 00:04:34 -06:00
Himanshu Gupta 817cf41f5c druid aggregators based on datasketches lib http://datasketches.github.io/ 2015-11-12 00:04:33 -06:00
Gian Merlino e4e5f0375b SegmentAllocateAction (fixes #1515)
This is a feature meant to allow realtime tasks to work without being told upfront
what shardSpec they should use (so we can potentially publish a variable number
of segments per interval).

The idea is that there is a "pendingSegments" table in the metadata store that
tracks allocated segments. Each one has a segment id (the same segment id we know
and love) and is also part of a sequence.

The sequences are an idea from @cheddar that offers a way of doing replication.
If there are N tasks reading exactly the same data with exactly the same logic
(think Kafka tasks reading a fixed range of offsets) then you can place them
in the same sequence, and they will generate the same sequence of segments.
2015-11-11 16:54:35 -08:00
Xavier Léauté fa6142e217 cleanup and remove unused imports 2015-11-11 12:25:21 -08:00
Charles Allen 1df4baf489 Move Jackson Guice adapters into io.druid
* Removes access to protected methods in com.fasterxml
* Eliminates druid-common's use of foreign package com.fasterxml
2015-11-09 10:50:45 -08:00
Lou Marvin Caraig c924f9fe56 Added cloudfiles-extensions in order to support Rackspace's cloudfiles as deep storage 2015-11-04 17:44:48 +01:00
Himanshu Gupta e9cfb7f46f refer to top level property for hadoop version instead of hardcoding 2.3.0 2015-10-26 15:51:48 -05:00
Xavier Léauté e4ac78e43d bump next snapshot to 0.9.0 2015-10-20 13:46:13 -07:00
Xavier Léauté 4c2c7a2c37 update version to 0.8.3 2015-10-14 21:40:55 -07:00
Gian Merlino e3bb93e8c7 Revert "Merge pull request #1781 from dclim/nested-groupby-multiple-same-aggregator-fix-v2"
This reverts commit dae488b7c0, reversing
changes made to 397be4b897.
2015-10-01 00:05:59 -04:00
Fangjin Yang dae488b7c0 Merge pull request #1781 from dclim/nested-groupby-multiple-same-aggregator-fix-v2
Fix failure in nested groupBy with multiple aggregators with same fie…
2015-09-30 22:28:34 -04:00
David Lim 70ae5ca922 Fix failure in nested groupBy with multiple aggregators with same fieldName
Version 2 - Throws an exception if an outer query references an
aggregator that doesn't exist in the inner query, and then uses the
inner query aggregator names to form the columns for the intermediate
incremental index.

Also deleted all the getRequiredColumns() methods which are no longer
being used.

We do something wacky by adding an aggregator factory for the post
aggregators when building the intermediate incremental index, otherwise
queries on post aggregate results fail because the data isn't in the
incremental index.

Closes #1419
2015-09-30 15:43:11 -06:00
Charles Allen bc22d4ff6c Cleanup kafka-extraction-namespace
Remove extra build defines in kafka-extraction-namespace's pom.xml
2015-09-30 11:33:04 -07:00
Xavier Léauté 8a21b4cae3 Merge pull request #1697 from metamx/betterMissingQTLLogging
Better logging of URIExtractionNamespace failures due to missing files
2015-09-15 15:29:27 -07:00
Charles Allen f5ed6e885c Merge pull request #1702 from himanshug/double_datasource_in_storage_dir
do not have dataSource twice in path to segment storage on hdfs
2015-09-15 14:00:35 -07:00
Fangjin Yang 34ef81572d Merge pull request #1700 from himanshug/update_agg_test_helper
update indexing in the helper to use multiple persists and merge
2015-09-14 06:56:29 -07:00
Himanshu Gupta b989a7054c fix for "java.io.IOException: No FileSystem for scheme: hdfs" error
aka workaround for https://issues.apache.org/jira/browse/HDFS-8750
2015-09-11 15:35:59 -05:00
Himanshu Gupta 67aa3dc153 on HDFS store segments in "dataSource/interval/.." and not in "dataSource/dataSource/interval.." 2015-09-09 11:12:01 -05:00
Himanshu Gupta 5da58e48e0 use Rule based TemporaryFolder for cleanup of temp directory/files 2015-09-09 11:10:33 -05:00
Charles Allen 1977ac9c5d Better logging of URIExtractionNamespace failures due to missing files 2015-09-08 13:33:32 -07:00
Charles Allen 0b8a3035c6 Better timing and locking in NamespaceExtractionCacheManagerExecutorsTest 2015-09-04 13:02:14 -07:00
Nishant 0096e6a0a0 Merge pull request #1658 from metamx/cleanupJDBCExtractionNamespaceTest
Hopefully add better timeouts and ordering to JDBCExtractionNamespaceTest
2015-09-02 23:49:49 +05:30
Xavier Léauté 82f9ecf56b Merge pull request #1620 from metamx/longFriendlyQTL
Allow long values in the key or value fields for URIExtractionNamespace
2015-09-02 10:55:35 -07:00
cheddar 4f61b42f40 Merge pull request #1578 from b-slim/fix_extraction_filter_2
Fix UT and documentation to the extraction filter
2015-09-01 10:46:20 -07:00
Gian Merlino 940e1aa3eb Replace funky imports with standard ones.
1) Lots of Guava imports were not coming from the actual Guava
2) junit.framework.Assert should be org.junit.Assert
2015-08-28 18:02:05 -07:00
Himanshu Gupta 2e0dd1d792 adding UTs and addressing review comments to
firehoseV2 addition to Realtime[Manager|Plumber],
essential segment metadata persist support,
kafka-simple-consumer-firehose extension patch
2015-08-27 20:50:46 -05:00
lvjq 2237a8cf0f kafka 8 simple consumer firehose 2015-08-27 20:50:46 -05:00
Charles Allen ac8e32b58e Hopefully add better timeouts and ordering to JDBCExtractionNamespaceTest 2015-08-26 23:05:51 -07:00
Charles Allen b24a88b328 Allow long values in the key or value fields for URIExtractionNamespace 2015-08-26 09:44:03 -07:00
Fangjin Yang 33b862166a Merge pull request #1659 from himanshug/segment_kill_update
on kill segment, dont leave version, interval and dataSource dir behind on HDFS
2015-08-26 07:23:20 -07:00
Xavier Léauté c4d0e8d29b remove unnecessary pom verbiage 2015-08-25 16:07:03 -07:00
Gian Merlino 2bf9a70bfa Consolidate SQL retrying by moving logic into the connectors.
Also change boolean removeLock to void addLock in MetadataStorageActionHandler.
2015-08-25 12:42:29 -07:00
Himanshu Gupta 5b5a76ef6c adding unit test for HdfsDataSegmentKiller.testKill(..) 2015-08-23 22:21:03 -05:00
Himanshu Gupta c2bebfe39e delete version, interval, dataSource directories on segment deletion if possible, so that they are not left behind and consume ns quota on HDFS 2015-08-23 22:06:12 -05:00
Himanshu Gupta 9b54124cd0 pseudo integration tests for approximate histogram 2015-08-20 01:27:20 -05:00
Xavier Léauté 1abcd75696 Merge pull request #1624 from metamx/expandTimeouts
Expand timeouts on JDBCExtractionNamespaceTest
2015-08-18 21:32:50 -07:00
Xavier Léauté 3b2e41e42a update for next release 2015-08-18 17:16:46 -07:00
Charles Allen 38110820c3 Expand timeouts on JDBCExtractionNamespaceTest 2015-08-18 14:28:40 -07:00
Charles Allen db19d2d547 Revert "Update to guice 4.0" 2015-08-14 09:26:07 -07:00
Charles Allen 76fbb12959 Increase timeout in tests for NamespaceExtractionCacheManagerExecutorsTest 2015-08-11 13:54:54 -07:00
Charles Allen 7e61216287 Update to guice 4.0
- Mark a lot of `@Provides` methods as final since guice 4.0 disallows overriding them
2015-08-10 13:57:18 -07:00
Charles Allen 8be82c00bd Better handling of slow stuff in NamespaceExtractionCacheManagerExecutorsTest 2015-08-07 15:11:54 -07:00
Charles Allen e6226968a6 Merge pull request #1589 from druid-io/fix-firehose-doc
Add a lot more docs for firehoses
2015-08-06 12:45:24 -07:00
Charles Allen 8cdcf69714 Better handle timeouts in namespace tests 2015-08-06 10:20:18 -07:00