138 Commits

Author SHA1 Message Date
Fangjin Yang
1b46ea7b3d Merge pull request #2121 from metamx/jdbcExtractionNamespaceLocking
Add nicer locking and shorter timeouts to JDBCExtractionNamespaceTest
2015-12-18 19:02:36 -08:00
Fangjin Yang
14229ba0f2 Merge pull request #1922 from metamx/jsonIgnoresFinalFields
Change DefaultObjectMapper to NOT overwrite final fields unless explicitly asked to
2015-12-18 15:38:32 -08:00
Charles Allen
409eb0b7c6 Add nicer locking and shorter timeouts to JDBCExtractionNamespaceTest 2015-12-18 10:33:38 -08:00
Slim Bouguerra
ee1a39801a adding bulk lookup and reverse lookup 2015-12-10 08:29:41 -06:00
Fangjin Yang
f4ba13a1ac Merge pull request #2029 from b-slim/add_reverse_fn
Adding reverse lookup function to LookupExtractor.
2015-12-09 12:50:13 -08:00
Slim Bouguerra
85f339b687 introduction and implem of reverse lookup function unApply. 2015-12-09 10:02:57 -06:00
Gian Merlino
f6f7bec2b6 Update java-util. 2015-12-08 15:32:27 -08:00
Himanshu Gupta
62ba9ade37 unifying license header in all java files 2015-12-05 22:16:23 -06:00
Himanshu Gupta
f99bad7988 reformat datasketches module to satisfy druid style guidelines 2015-11-19 01:07:03 -06:00
Himanshu Gupta
fde9df2720 update to sketches-core-0.2.2 .
adds support for "cardinality" aggregator.
do not create sketch per event at ingestion time to make realtime ingestion faster
2015-11-19 01:05:59 -06:00
Fangjin Yang
21c84b5ff7 Merge pull request #1896 from gianm/allocate-segment
SegmentAllocateAction (fixes #1515)
2015-11-18 21:05:46 -08:00
Xavier Léauté
ba41f37ce1 fix #1701 - MySQL 5.7 defaults break database character set check 2015-11-17 15:51:58 -08:00
Fangjin Yang
148153b47c Merge pull request #1897 from himanshug/new_sketch_aggregation
complex aggregator based on http://datasketches.github.io
2015-11-12 09:01:01 -08:00
Himanshu Gupta
338f88b86b further simplifying the api, users just need to use thetaSketch as aggregator 2015-11-12 00:04:34 -06:00
Himanshu Gupta
88ae3c43f9 changing names to be explicit about theta sketch algorithm
old names are still valid though so as to be backwards compatible for now
2015-11-12 00:04:34 -06:00
Himanshu Gupta
817cf41f5c druid aggregators based on datasketches lib http://datasketches.github.io/ 2015-11-12 00:04:33 -06:00
Gian Merlino
e4e5f0375b SegmentAllocateAction (fixes #1515)
This is a feature meant to allow realtime tasks to work without being told upfront
what shardSpec they should use (so we can potentially publish a variable number
of segments per interval).

The idea is that there is a "pendingSegments" table in the metadata store that
tracks allocated segments. Each one has a segment id (the same segment id we know
and love) and is also part of a sequence.

The sequences are an idea from @cheddar that offers a way of doing replication.
If there are N tasks reading exactly the same data with exactly the same logic
(think Kafka tasks reading a fixed range of offsets) then you can place them
in the same sequence, and they will generate the same sequence of segments.
2015-11-11 16:54:35 -08:00
Xavier Léauté
fa6142e217 cleanup and remove unused imports 2015-11-11 12:25:21 -08:00
Charles Allen
abae47850a Add backwards compatability for PR #1922 2015-11-11 10:27:00 -08:00
Charles Allen
1df4baf489 Move Jackson Guice adapters into io.druid
* Removes access to protected methods in com.fasterxml
* Eliminates druid-common's use of foreign package com.fasterxml
2015-11-09 10:50:45 -08:00
Charles Allen
929b981710 Change DefaultObjectMapper to NOT overwrite final fields unless explicitly asked to 2015-11-05 18:10:13 -08:00
Lou Marvin Caraig
c924f9fe56 Added cloudfiles-extensions in order to support Rackspace's cloudfiles as deep storage 2015-11-04 17:44:48 +01:00
Himanshu Gupta
e9cfb7f46f refer to top level property for hadoop version instead of hardcoding 2.3.0 2015-10-26 15:51:48 -05:00
Xavier Léauté
e4ac78e43d bump next snapshot to 0.9.0 2015-10-20 13:46:13 -07:00
Xavier Léauté
4c2c7a2c37 update version to 0.8.3 2015-10-14 21:40:55 -07:00
Gian Merlino
e3bb93e8c7 Revert "Merge pull request #1781 from dclim/nested-groupby-multiple-same-aggregator-fix-v2"
This reverts commit dae488b7c001f11f88584598fcfa15026880794d, reversing
changes made to 397be4b897ccbd84d778218224d7dabb3db0a102.
2015-10-01 00:05:59 -04:00
Fangjin Yang
dae488b7c0 Merge pull request #1781 from dclim/nested-groupby-multiple-same-aggregator-fix-v2
Fix failure in nested groupBy with multiple aggregators with same fie…
2015-09-30 22:28:34 -04:00
David Lim
70ae5ca922 Fix failure in nested groupBy with multiple aggregators with same fieldName
Version 2 - Throws an exception if an outer query references an
aggregator that doesn't exist in the inner query, and then uses the
inner query aggregator names to form the columns for the intermediate
incremental index.

Also deleted all the getRequiredColumns() methods which are no longer
being used.

We do something wacky by adding an aggregator factory for the post
aggregators when building the intermediate incremental index, otherwise
queries on post aggregate results fail because the data isn't in the
incremental index.

Closes #1419
2015-09-30 15:43:11 -06:00
Charles Allen
bc22d4ff6c Cleanup kafka-extraction-namespace
Remove extra build defines in kafka-extraction-namespace's pom.xml
2015-09-30 11:33:04 -07:00
Xavier Léauté
8a21b4cae3 Merge pull request #1697 from metamx/betterMissingQTLLogging
Better logging of URIExtractionNamespace failures due to missing files
2015-09-15 15:29:27 -07:00
Charles Allen
f5ed6e885c Merge pull request #1702 from himanshug/double_datasource_in_storage_dir
do not have dataSource twice in path to segment storage on hdfs
2015-09-15 14:00:35 -07:00
Fangjin Yang
34ef81572d Merge pull request #1700 from himanshug/update_agg_test_helper
update indexing in the helper to use multiple persists and merge
2015-09-14 06:56:29 -07:00
Himanshu Gupta
b989a7054c fix for "java.io.IOException: No FileSystem for scheme: hdfs" error
aka workaround for https://issues.apache.org/jira/browse/HDFS-8750
2015-09-11 15:35:59 -05:00
Himanshu Gupta
67aa3dc153 on HDFS store segments in "dataSource/interval/.." and not in "dataSource/dataSource/interval.." 2015-09-09 11:12:01 -05:00
Himanshu Gupta
5da58e48e0 use Rule based TemporaryFolder for cleanup of temp directory/files 2015-09-09 11:10:33 -05:00
Charles Allen
1977ac9c5d Better logging of URIExtractionNamespace failures due to missing files 2015-09-08 13:33:32 -07:00
Charles Allen
0b8a3035c6 Better timing and locking in NamespaceExtractionCacheManagerExecutorsTest 2015-09-04 13:02:14 -07:00
Nishant
0096e6a0a0 Merge pull request #1658 from metamx/cleanupJDBCExtractionNamespaceTest
Hopefully add better timeouts and ordering to JDBCExtractionNamespaceTest
2015-09-02 23:49:49 +05:30
Xavier Léauté
82f9ecf56b Merge pull request #1620 from metamx/longFriendlyQTL
Allow long values in the key or value fields for URIExtractionNamespace
2015-09-02 10:55:35 -07:00
cheddar
4f61b42f40 Merge pull request #1578 from b-slim/fix_extraction_filter_2
Fix UT and documentation to the extraction filter
2015-09-01 10:46:20 -07:00
Gian Merlino
940e1aa3eb Replace funky imports with standard ones.
1) Lots of Guava imports were not coming from the actual Guava
2) junit.framework.Assert should be org.junit.Assert
2015-08-28 18:02:05 -07:00
Himanshu Gupta
2e0dd1d792 adding UTs and addressing review comments to
firehoseV2 addition to Realtime[Manager|Plumber],
essential segment metadata persist support,
kafka-simple-consumer-firehose extension patch
2015-08-27 20:50:46 -05:00
lvjq
2237a8cf0f kafka 8 simple consumer firehose 2015-08-27 20:50:46 -05:00
Charles Allen
ac8e32b58e Hopefully add better timeouts and ordering to JDBCExtractionNamespaceTest 2015-08-26 23:05:51 -07:00
Charles Allen
b24a88b328 Allow long values in the key or value fields for URIExtractionNamespace 2015-08-26 09:44:03 -07:00
Fangjin Yang
33b862166a Merge pull request #1659 from himanshug/segment_kill_update
on kill segment, dont leave version, interval and dataSource dir behind on HDFS
2015-08-26 07:23:20 -07:00
Xavier Léauté
c4d0e8d29b remove unnecessary pom verbiage 2015-08-25 16:07:03 -07:00
Gian Merlino
2bf9a70bfa Consolidate SQL retrying by moving logic into the connectors.
Also change boolean removeLock to void addLock in MetadataStorageActionHandler.
2015-08-25 12:42:29 -07:00
Himanshu Gupta
5b5a76ef6c adding unit test for HdfsDataSegmentKiller.testKill(..) 2015-08-23 22:21:03 -05:00
Himanshu Gupta
c2bebfe39e delete version, interval, dataSource directories on segment deletion if possible, so that they are not left behind and consume ns quota on HDFS 2015-08-23 22:06:12 -05:00