Commit Graph

165 Commits

Author SHA1 Message Date
Fangjin Yang 0c31f007fc Merge pull request #1728 from himanshug/aggregators_in_segment_metadata
Store AggregatorFactory[] in segment metadata
2016-01-19 12:55:49 -08:00
Himanshu Gupta a99aef29a1 adding aggregators to segment metadata 2016-01-19 14:23:39 -06:00
Himanshu Gupta 52eb0f04a7 adding a new method getMergingFactory(..) to AggregatorFactory 2016-01-18 22:03:46 -06:00
Himanshu Gupta 77fc86c015 making AggregatorFactory abstract class 2016-01-18 22:03:46 -06:00
Himanshu Gupta dcd3a24f59 adding log line for segment being killed in HdfsDataSegmentKiller 2016-01-18 21:51:04 -06:00
Kurt Young 1f2168fae5 add IndexMergerV9
add unit tests for IndexMergerV9 and fix some bugs

add more unit tests and fix bugs

handle null values and add more tests

minor changes & use LoggingProgressIndicator in IndexGeneratorReducer

make some static class public from IndexMerger

minor changes and add some comments

changes for comments
2016-01-16 11:25:28 +08:00
Charles Allen 13c63bad72 Make timeouts more explicit on what is failing in JDBCExtractionNamespaceTest 2016-01-07 11:16:36 -08:00
Fangjin Yang aaea95ed1b Merge pull request #2207 from himanshug/theta_sketch_select_query
fix bug for thetaSketch metric not working with select queries
2016-01-07 09:46:09 -08:00
fjy 2103906a48 add pusher tests for all deep storages 2016-01-05 22:22:48 -08:00
Himanshu Gupta c6634d7c2c adding json for thetaSketch Memory object representation 2016-01-05 22:12:52 -06:00
Himanshu Gupta 62e5e45da8 add select query UT for thetaSketch 2016-01-05 22:12:52 -06:00
Himanshu Gupta 3f048f0b15 adding support to execute Select queries in AggregationTestHelper so that Select query based UTs can be written for complex aggregator implementations 2016-01-05 21:54:55 -06:00
Charles Allen 6d886da7d9 Merge pull request #2191 from duilio/fix-rackspace-cloudfiles-segment-size
store uncompressed index size on cloudfiles storage extension
2016-01-05 17:17:35 -08:00
Zhao Weinan 5e57ddb8cc Adding avro support to realtime & hadoop batch indexing. 2016-01-05 10:21:27 +08:00
Charles Allen 957646be2c Fixes to JDBCExtractionNamespaceTest 2016-01-04 09:56:07 -08:00
maurizio 5ea0b96d9a store uncompressed index size instead of the compressed one in cf storage extension 2016-01-04 14:50:27 +01:00
fjy 57d91d754d Comment out buggy unit tests, fix #2185 2016-01-03 09:50:16 -08:00
fjy 89fc18bb55 increase timeouts for jdbc tearDown 2016-01-01 20:08:06 -08:00
fjy ca46f1d40c attempt to fix transient tests again 2015-12-30 21:39:28 -08:00
Bingkun Guo 492adeaaa7 Merge pull request #2172 from gianm/remove-kafka-seven
Remove unused kafka-seven extension.
2015-12-29 15:19:28 -06:00
Fangjin Yang b1261035a7 Merge pull request #1861 from guobingkun/insert_segment_tool
insert-segment tool
2015-12-29 10:06:07 -08:00
Gian Merlino 891d639188 Remove unused kafka-seven extension. 2015-12-29 12:05:27 -05:00
fjy 38b0f1fbc2 fix transient failures in unit tests 2015-12-28 20:03:30 -08:00
Fangjin Yang e490650865 Merge pull request #2110 from navis/fix-sporadic-testfail
Fix sporadic fail of URIExtractionNamespaceFunctionFactoryTest#testReverseFunction
2015-12-27 14:45:09 -08:00
Charles Allen 05c9e1b598 Reorder Before/After in JDBCExtractionNamespaceTest
* Fixes https://github.com/druid-io/druid/issues/2120
2015-12-22 11:39:46 -08:00
Bingkun Guo 89b477970f DataSegmentFinder tool
`insert-segment-to-db` is a tool that can insert segments into Druid metadata storage. It is intended to be used
to update the segment table in metadata storage after people manually migrate segments from one place to another.
It can also be used to insert missing segment into Druid, or even recover metadata storage by telling it where the
segments are stored.

Note: This tool expects users to have Druid cluster running in a "safe" mode, where there are no active tasks to interfere
the segments being inserted. Users can optionally bring down the cluster to make 100% sure nothing is interfering.
2015-12-21 00:02:04 -06:00
Fangjin Yang 1b46ea7b3d Merge pull request #2121 from metamx/jdbcExtractionNamespaceLocking
Add nicer locking and shorter timeouts to JDBCExtractionNamespaceTest
2015-12-18 19:02:36 -08:00
Fangjin Yang 14229ba0f2 Merge pull request #1922 from metamx/jsonIgnoresFinalFields
Change DefaultObjectMapper to NOT overwrite final fields unless explicitly asked to
2015-12-18 15:38:32 -08:00
Charles Allen 409eb0b7c6 Add nicer locking and shorter timeouts to JDBCExtractionNamespaceTest 2015-12-18 10:33:38 -08:00
navis.ryu 31b205afcd Fix sporadic fail of URIExtractionNamespaceFunctionFactoryTest#testReverseFunction 2015-12-18 14:37:00 +09:00
Slim Bouguerra ee1a39801a adding bulk lookup and reverse lookup 2015-12-10 08:29:41 -06:00
Fangjin Yang f4ba13a1ac Merge pull request #2029 from b-slim/add_reverse_fn
Adding reverse lookup function to LookupExtractor.
2015-12-09 12:50:13 -08:00
Slim Bouguerra 85f339b687 introduction and implem of reverse lookup function unApply. 2015-12-09 10:02:57 -06:00
Gian Merlino f6f7bec2b6 Update java-util. 2015-12-08 15:32:27 -08:00
Himanshu Gupta 62ba9ade37 unifying license header in all java files 2015-12-05 22:16:23 -06:00
Himanshu Gupta f99bad7988 reformat datasketches module to satisfy druid style guidelines 2015-11-19 01:07:03 -06:00
Himanshu Gupta fde9df2720 update to sketches-core-0.2.2 .
adds support for "cardinality" aggregator.
do not create sketch per event at ingestion time to make realtime ingestion faster
2015-11-19 01:05:59 -06:00
Fangjin Yang 21c84b5ff7 Merge pull request #1896 from gianm/allocate-segment
SegmentAllocateAction (fixes #1515)
2015-11-18 21:05:46 -08:00
Xavier Léauté ba41f37ce1 fix #1701 - MySQL 5.7 defaults break database character set check 2015-11-17 15:51:58 -08:00
Fangjin Yang 148153b47c Merge pull request #1897 from himanshug/new_sketch_aggregation
complex aggregator based on http://datasketches.github.io
2015-11-12 09:01:01 -08:00
Himanshu Gupta 338f88b86b further simplifying the api, users just need to use thetaSketch as aggregator 2015-11-12 00:04:34 -06:00
Himanshu Gupta 88ae3c43f9 changing names to be explicit about theta sketch algorithm
old names are still valid though so as to be backwards compatible for now
2015-11-12 00:04:34 -06:00
Himanshu Gupta 817cf41f5c druid aggregators based on datasketches lib http://datasketches.github.io/ 2015-11-12 00:04:33 -06:00
Gian Merlino e4e5f0375b SegmentAllocateAction (fixes #1515)
This is a feature meant to allow realtime tasks to work without being told upfront
what shardSpec they should use (so we can potentially publish a variable number
of segments per interval).

The idea is that there is a "pendingSegments" table in the metadata store that
tracks allocated segments. Each one has a segment id (the same segment id we know
and love) and is also part of a sequence.

The sequences are an idea from @cheddar that offers a way of doing replication.
If there are N tasks reading exactly the same data with exactly the same logic
(think Kafka tasks reading a fixed range of offsets) then you can place them
in the same sequence, and they will generate the same sequence of segments.
2015-11-11 16:54:35 -08:00
Xavier Léauté fa6142e217 cleanup and remove unused imports 2015-11-11 12:25:21 -08:00
Charles Allen abae47850a Add backwards compatability for PR #1922 2015-11-11 10:27:00 -08:00
Charles Allen 1df4baf489 Move Jackson Guice adapters into io.druid
* Removes access to protected methods in com.fasterxml
* Eliminates druid-common's use of foreign package com.fasterxml
2015-11-09 10:50:45 -08:00
Charles Allen 929b981710 Change DefaultObjectMapper to NOT overwrite final fields unless explicitly asked to 2015-11-05 18:10:13 -08:00
Lou Marvin Caraig c924f9fe56 Added cloudfiles-extensions in order to support Rackspace's cloudfiles as deep storage 2015-11-04 17:44:48 +01:00
Himanshu Gupta e9cfb7f46f refer to top level property for hadoop version instead of hardcoding 2.3.0 2015-10-26 15:51:48 -05:00