druid

Commit Graph

Author	SHA1	Message	Date
Roman Leventov	c070b4a816	Fix concurrency defects, remove unnecessary volatiles (#3701 )	2016-11-22 16:42:28 -08:00
Erik Dubbelboer	7d36f540e8	WIP: Add Google Storage support (#2458 ) Also excludes the correct artifacts from #2741	2016-11-16 14:06:45 +05:30
Gian Merlino	bcd20441be	Make buildV9Directly the default. (#3688 )	2016-11-14 09:29:32 -08:00
praveev	52a74cf84f	Use timestamp in millis as Map key instead of DateTime object (#3674 ) * Use Long timestamp as key instead of DateTime. DateTime representation is screwed up when you store with an obj and read with a different DateTime obj. For example: The code below fails when you use DateTime as key ``` DateTime odt = DateTime.now(DateTimeUtils.getZone(DateTimeZone.forID("America/Los_Angeles"))); HashMap<DateTime, String> map = new HashMap<>(); map.put(odt, "abc"); DateTime dt = new DateTime(odt.getMillis()); System.out.println(map.get(dt)); ``` * Respect timezone when creating the file. * Update docs with timezone caveat in granularity spec * Remove unused imports	2016-11-11 10:20:20 -08:00
Himanshu	b76b3f8d85	reset-cluster command to clean up druid state stored on metadata and deep storage (#3670 )	2016-11-09 11:07:01 -06:00
Gian Merlino	89d9c61894	Deprecate Aggregator.getName and AggregatorFactory.getAggregatorStartValue. (#3572 )	2016-10-31 15:24:30 -07:00
Akash Dwivedi	6a845e1f7b	Adding getDelegate() to directly access delegate. (#3616 ) 👍	2016-10-27 15:57:36 -07:00
Akash Dwivedi	4b3bd8bd63	Migrating java-util from Metamarkets. (#3585 ) * Migrating java-util from Metamarkets. * checkstyle and updated license on java-util files. * Removed unused imports from whole project. * cherry pick metamx/java-util@826021f. * Copyright changes on java-util pom, address review comments.	2016-10-21 14:57:07 -07:00
Gian Merlino	dd0bb6da1e	Unit test for #3544 : Avoid exceptions for dataSource spec when using s3. (#3571 )	2016-10-17 12:41:43 -07:00
Navis Ryu	4554c1214b	Avoid exceptions for dataSource spec when using s3 (#3544 )	2016-10-14 18:24:19 -07:00
Akash Dwivedi	078de4fcf9	Use explicit version from HadoopIngestionSpec. (#3554 )	2016-10-07 13:59:14 -07:00
praveev	43cdc675c7	Add support for timezone in segment granularity (#3528 ) * Add support for timezone in segment granularity * CR feedback. Handle null timezone during equals check. * Include timezone in docs. Add timezone for ArbitraryGranularitySpec.	2016-10-03 08:15:42 -07:00
Fokko Driesprong	67920c114e	Fixed info message (#3481 )	2016-09-21 15:50:29 -07:00
Gian Merlino	27bd5cb13a	Add forceExtendableShardSpecs option to Hadoop indexing, IndexTask. (#3473 ) Fixes #3241.	2016-09-21 13:40:04 -06:00
Slim	ba6ddf307e	Adding hadoop kerberos authentification. (#3419 ) * adding kerberos authentication * make the 2 functions identical	2016-09-13 10:42:50 -07:00
Jonathan Wei	df766b2bbd	Add dimension handling interface for ingestion and segment creation (#3217 ) * Add dimension handling interface for ingestion and segment creation * update javadocs for DimensionHandler/DimensionIndexer * Move IndexIO row validation into DimensionHandler * Fix null column skipping in mergerV9 * Add deprecation note for 'numeric_dims' filename pattern in IndexIO v8->v9 conversion * Fix java7 test failure	2016-09-12 12:54:02 -07:00
Himanshu	3b6c81e7c0	fix cleanup of hadoop ingestion intermediate path (#3385 )	2016-09-08 01:36:56 +05:30
Dave Li	c4e8440c22	Adds long compression methods (#3148 ) * add read * update deprecated guava calls * add write and vsizeserde * add benchmark * separate encoding and compression * add header and reformat * update doc * address PR comment * fix buffer order * generate benchmark files * separate encoding strategy and format * fix benchmark * modify supplier write to channel * add float NONE handling * address PR comment * address PR comment 2	2016-08-30 16:17:46 -07:00
Hamlet Lee	e4f0eac8e6	Fix issue #2707 (#2708 )	2016-08-16 12:19:44 -05:00
Gian Merlino	a2bcd97512	IncrementalIndex: Fix multi-value dimensions returned from iterators. (#3344 ) They had arrays as values, which MapBasedRow doesn't understand and toStrings rather than converting to lists.	2016-08-10 08:47:29 -07:00
Gian Merlino	9437a7a313	HLL: Avoid some allocations when possible. (#3314 ) - HLLC.fold avoids duplicating the other buffer by saving and restoring its position. - HLLC.makeCollector(buffer) no longer duplicates incoming BBs. - Updated call sites where appropriate to duplicate BBs passed to HLLC.	2016-08-03 18:08:52 -07:00
kaijianding	50d52a24fc	ability to not rollup at index time, make pre aggregation an option (#3020 ) * ability to not rollup at index time, make pre aggregation an option * rename getRowIndexForRollup to getPriorIndex * fix doc misspelling * test query using no-rollup indexes * fix benchmark fail due to jmh bug	2016-08-02 11:13:05 -07:00
kaijianding	3dc2974894	Add timestampSpec to metadata.drd and SegmentMetadataQuery (#3227 ) * save TimestampSpec in metadata.drd * add timestampSpec info in SegmentMetadataQuery	2016-07-25 15:45:30 -07:00
Navis Ryu	cd7337fc8a	Calculate max split size based on numMapTask in DatasourceInputFormat (#2882 ) * Calculate max split size based on numMapTask * updated docs & fixed possible ArithmeticException	2016-07-20 16:53:51 -07:00
Hyukjin Kwon	55e7a52475	Replace deprecated usage for StringInputRowParser and JSONParseSpec (#3215 )	2016-07-14 09:19:17 -07:00
Gian Merlino	ea03906fcf	Configurable compressRunOnSerialization for Roaring bitmaps. (#3228 ) Defaults to true, which is a change in behavior (this used to be false and unconfigurable).	2016-07-08 10:24:19 +05:30
Hyukjin Kwon	45f553fc28	Replace the deprecated usage of NoneShardSpec (#3166 )	2016-06-25 10:27:25 -07:00
Nishant	2696b0c451	Retry for transient exceptions while doing cleanup for Hadoop Jobs (#3177 ) * fix 1828 fixes https://github.com/druid-io/druid/issues/1828 * remove unused import * Review comment	2016-06-23 13:38:47 -07:00
Nishant	6f330dc816	Better handling for parseExceptions for Batch Ingestion (#3171 ) * Better handling for parseExceptions * make parseException handling consistent with Realtime * change combiner default val to true * review comments * review comments	2016-06-22 16:38:29 -07:00
Nishant	778f97a80e	attempt to fix-2906 (#2985 ) * attempt to fix-2984 * review comments * Add test	2016-05-18 15:12:38 -05:00
Charles Allen	15ccf451f9	Move QueryGranularity static fields to QueryGranularities (#2980 ) * Move QueryGranularity static fields to QueryGranularityUtil * Fixes #2979 * Add test showing #2979 * change name to QueryGranularities	2016-05-17 16:23:48 -07:00
David Lim	b489f63698	Supervisor for KafkaIndexTask (#2656 ) * supervisor for kafka indexing tasks * cr changes	2016-05-04 23:13:13 -07:00
Navis Ryu	49ef4d96cb	Merge pull request #2802 from navis/optimize_multiplepath_concat Optimize adding lots of paths to pathspec	2016-04-11 23:35:28 -05:00
jon-wei	0e481d6f93	Allow filters to use extraction functions	2016-04-05 13:24:56 -07:00
Gian Merlino	977e867ad8	Downgrade geoip2, exclude com.google.http-client. Reverts "Update com.maxmind.geoip2 to 2.6.0" and exclude the google http client from com.maxmind.geoip2. This should satisfy the original need from #2646 (wanting to run Druid along with an upgraded com.google.http-client) while preventing Jackson conflicts pointed out in #2717. Fixes #2717. This reverts commit `21b7572533`.	2016-03-25 14:43:22 -07:00
Gian Merlino	ff25325f3b	Improved docs for multi-value dimensions. - Add central doc for multi-value dimensions, with some content from other docs. - Link to multi-value dimension doc from topN and groupBy docs. - Fixes a broken link from dimensionspecs.md, which was presciently already linking to this nonexistent doc. - Resolve inconsistent naming in docs & code (sometimes "multi-valued", sometimes "multi-value") in favor of "multi-value".	2016-03-22 14:40:55 -07:00
Himanshu	00d7021291	Merge pull request #2607 from jon-wei/dim_schema Support use of DimensionSchema class in DimensionsSpec	2016-03-22 11:53:46 -05:00
binlijin	bce600f5d5	Single dimension hash-based partitioning	2016-03-22 13:15:33 +08:00
jon-wei	a59c9ee1b1	Support use of DimensionSchema class in DimensionsSpec	2016-03-21 13:12:04 -07:00
Himanshu	ea3281ad78	Merge pull request #2645 from atomx/gs-scheme Add gs:// hdfs support	2016-03-14 22:15:42 -05:00
Erik Dubbelboer	375620cfb3	Add gs:// hdfs support Used to access google cloud storage	2016-03-12 08:57:57 +00:00
Gian Merlino	187569e702	DataSource metadata. Geared towards supporting transactional inserts of new segments. This involves an interface "DataSourceMetadata" that allows combining of partially specified metadata (useful for partitioned ingestion). DataSource metadata is stored in a new "dataSource" table.	2016-03-10 17:41:50 -08:00
Himanshu Gupta	eab8a0b54d	in DatasourceInputFormat code for determining segment block locations avoid the split calulation by helper TextInputFormat	2016-03-10 14:28:53 -06:00
Bingkun Guo	c20d7682a9	log exceptions correctly in DatasourceInputFormat and IndexGeneratorJob	2016-03-09 13:41:31 -06:00
gaodayue	a6dc3703ca	use ISODataTimeFormat for both hdfs and viewfs schema to support Federationed HDFS	2016-03-08 13:55:05 +08:00
Björn Zettergren	2462c82c0e	New defaults for maxRowsInMemory rowFlushBoundary To bring consistency to docs and source this commit changes the default values for maxRowsInMemory and rowFlushBoundary to 75000 after discussion in PR https://github.com/druid-io/druid/pull/2457. The previous default was 500000 and it's lower now on the grounds that it's better for a default to be somewhat less efficient, and work, than to reach for the stars and possibly result in "OutOfMemoryError: java heap space" errors.	2016-03-01 13:50:28 +01:00
Gian Merlino	3534483433	Better handling of ParseExceptions. Two changes: - Allow IncrementalIndex to suppress ParseExceptions on "aggregate". - Add "reportParseExceptions" option to realtime tuning configs. By default this is "false". Behavior of the counters should now be: - processed: Number of rows indexed, including rows where some fields could be parsed and some could not. - thrownAway: Number of rows thrown away due to rejection policy. - unparseable: Number of rows thrown away due to being completely unparseable (no fields salvageable at all). If "reportParseExceptions" is true then "unparseable" will always be zero (because a parse error would cause an exception to be thrown). In addition, "processed" will only include fully parseable rows (because even partial parse failures will cause exceptions to be thrown). Fixes #2510.	2016-02-23 10:11:43 -08:00
Himanshu Gupta	09ffcae4ae	give user the option to specify the segments for dataSource inputSpec	2016-02-21 23:15:31 -06:00
Himanshu Gupta	2faae9d0d1	In JobHelper.makeSegmentOutputPath(..) use DataSegmentPusherUtils to construct the segment storage path	2016-02-09 21:42:32 -06:00
Himanshu Gupta	b3437825f0	add ignoreWhenNoSegments flag to optionally ignore the dataSource inputSpec when no segments were found	2016-01-26 17:23:55 -06:00
binlijin	cd1c71ceb4	rename persistBackgroundCount to numBackgroundPersistThreads	2016-01-22 14:29:41 +08:00
Charles Allen	2a69a58570	Merge pull request #2149 from binlijin/master Do persist IncrementalIndex in another thread in IndexGeneratorReducer	2016-01-20 17:06:42 -08:00
Fangjin Yang	996c1173c6	Merge pull request #2223 from navis/besteffort-split-locations Best effort to find locations for input splits	2016-01-20 16:53:43 -08:00
Fangjin Yang	695f107870	Merge pull request #2302 from metamx/lowerCaseGranPathTest Make GranularityPathSpecTest check with lower-case enums	2016-01-20 09:18:06 -08:00
Charles Allen	3c5ca3a5f2	Make GranularityPathSpecTest check with lower-case enums	2016-01-20 08:35:13 -08:00
binlijin	8e43e2c446	Do persist IncrementalIndex in another thread in IndexGeneratorReducer	2016-01-20 09:20:09 +08:00
jon-wei	747343e621	Preserve dimension order across indexes during ingestion	2016-01-19 13:34:11 -08:00
Jonathan Wei	df2906a91c	Merge pull request #2290 from gianm/index-merger-v9-stuff Respect buildV9Directly in PlumberSchools, so it works on standalone realtime.	2016-01-19 13:04:00 -08:00
Gian Merlino	1dcf22edb7	Respect buildV9Directly in PlumberSchools, so it works on standalone realtime nodes. Also parameterize some tests to run with/without buildV9Directly: - IndexGeneratorJobTest - RealtimeIndexTaskTest - RealtimePlumberSchoolTest	2016-01-19 12:15:06 -08:00
Himanshu Gupta	164b0aad7a	removing Map<String,Object> segmentMetadata from methods in Index[Maker/Merger] and using Metadata class instead of a Map to store segment metadata	2016-01-18 22:03:46 -06:00
navis.ryu	f03f7fb625	Best effort to find locations for input splits	2016-01-18 08:31:05 +09:00
Kurt Young	82ff98c2bf	add config for build v9 directly and update docs	2016-01-16 11:26:34 +08:00
Kurt Young	1f2168fae5	add IndexMergerV9 add unit tests for IndexMergerV9 and fix some bugs add more unit tests and fix bugs handle null values and add more tests minor changes & use LoggingProgressIndicator in IndexGeneratorReducer make some static class public from IndexMerger minor changes and add some comments changes for comments	2016-01-16 11:25:28 +08:00
navis.ryu	976ebc45c0	Simplify information in IncrementalIndex	2016-01-12 10:18:11 +09:00
dclim	2308c8c07f	continue hadoop job for sparse intervals	2016-01-07 01:35:08 -07:00
fjy	faf421726b	remove IndexMaker	2015-12-28 14:19:02 -08:00
Fangjin Yang	14229ba0f2	Merge pull request #1922 from metamx/jsonIgnoresFinalFields Change DefaultObjectMapper to NOT overwrite final fields unless explicitly asked to	2015-12-18 15:38:32 -08:00
binlijin	219367221b	optimize InputRowSerde	2015-12-09 09:51:56 +08:00
Fangjin Yang	d957a6602c	Merge pull request #2049 from himanshug/hadoop_indexing_unique_path add a unique string to intermediate path for the hadoop indexing task	2015-12-07 11:46:16 -08:00
Himanshu Gupta	6cfaf59d7e	add a unique string to intermediate path for the hadoop indexing task	2015-12-06 22:20:38 -06:00
Himanshu Gupta	62ba9ade37	unifying license header in all java files	2015-12-05 22:16:23 -06:00
Himanshu Gupta	61aaa09012	support multiple intervals in dataSource input spec	2015-12-03 21:28:04 -06:00
Fangjin Yang	21c84b5ff7	Merge pull request #1896 from gianm/allocate-segment SegmentAllocateAction (fixes #1515)	2015-11-18 21:05:46 -08:00
Gian Merlino	e4e5f0375b	SegmentAllocateAction (fixes #1515 ) This is a feature meant to allow realtime tasks to work without being told upfront what shardSpec they should use (so we can potentially publish a variable number of segments per interval). The idea is that there is a "pendingSegments" table in the metadata store that tracks allocated segments. Each one has a segment id (the same segment id we know and love) and is also part of a sequence. The sequences are an idea from @cheddar that offers a way of doing replication. If there are N tasks reading exactly the same data with exactly the same logic (think Kafka tasks reading a fixed range of offsets) then you can place them in the same sequence, and they will generate the same sequence of segments.	2015-11-11 16:54:35 -08:00
Xavier Léauté	fa6142e217	cleanup and remove unused imports	2015-11-11 12:25:21 -08:00
Charles Allen	abae47850a	Add backwards compatability for PR #1922	2015-11-11 10:27:00 -08:00
Gian Merlino	dfbd0e2b60	Merge pull request #1925 from gianm/fix-index-generator Fix reference to INDEX_MAKER in IndexGeneratorJob.	2015-11-06 09:56:30 -08:00
Gian Merlino	75122dc396	Fix reference to INDEX_MAKER in IndexGeneratorJob.	2015-11-06 09:19:58 -08:00
Himanshu Gupta	6bed633121	do not use LoggingProcessIndicator in IndexGeneratorJob because that uses Stopwatch methods from guava not available in older guava versions, this makes the behavior same as LegacyIndexGeneratorJob	2015-11-06 00:40:51 -06:00
Charles Allen	929b981710	Change DefaultObjectMapper to NOT overwrite final fields unless explicitly asked to	2015-11-05 18:10:13 -08:00
Xavier Léauté	223d1ebe9f	fix a very old todo	2015-11-05 13:00:30 -08:00
fjy	8f231fd3e3	cleanup druid codebase	2015-11-04 13:59:53 -08:00
Himanshu Gupta	84f7d8d264	making static final variables in HadoopDruidIndexerConfig upper case	2015-11-02 23:24:26 -06:00
Himanshu Gupta	8b67417ac8	make methods in Index[Merger,Maker,IO] non-static so that they can have appropriate ObjectMapper injected instead of creating one statically	2015-11-02 23:24:26 -06:00
Nishant	3641a0e553	Fix Race in jar upload during hadoop indexing - https://github.com/druid-io/druid/issues/582 few fixes delete intermediate file early better exception handling use static pattern instead of compiling it every time Add retry for transient exceptions remove usage of deprecated method. Add test fix imports fix javadoc review comment. review comment: handle crazy snapshot naming review comments remove default retry count in favour of already present constant review comment make random intermediate and final paths. review comment, use temporaryFolder where possible	2015-10-22 21:41:07 +05:30
Himanshu Gupta	0368260018	For dataSource inputSpec in hadoop batch ingestion, use configured query granularity for reading existing segments instead of NONE	2015-10-12 22:19:44 -05:00
Gian Merlino	3aba401ee0	SQLMetadataConnector: Retry table creation, in case something goes wrong. Also rejigger table creation methods to not take a DBI. It's already available inside the connector, and everyone was just using that one anyway.	2015-09-24 21:39:36 -07:00
Himanshu Gupta	e8b9ee85a7	HadoopyStringInputRowParser to convert stringy Text, BytesWritable etc into InputRow	2015-09-16 10:58:13 -05:00
Himanshu Gupta	74f4572bd4	Lazily deserialize "parser" to InputRowParser in DataSchema so that user hadoop related InputRowParsers are created only when needed this allows overlord to accept a HadoopIndexTask with a hadoopy InputRowParser and not fail because hadoopy InputRowParser might need hadoop libraries	2015-09-16 10:58:13 -05:00
Himanshu Gupta	9ca6106128	user specified hadoop settings are ignored if explicitly set in code	2015-08-31 10:50:18 -05:00
Gian Merlino	940e1aa3eb	Replace funky imports with standard ones. 1) Lots of Guava imports were not coming from the actual Guava 2) junit.framework.Assert should be org.junit.Assert	2015-08-28 18:02:05 -07:00
jon-wei	e5c4927b14	Add support for parsing BytesWritable strings to Hadoop Indexer	2015-08-28 14:27:14 -07:00
Gian Merlino	414a6fb477	Fix overlapping segments in IngestSegmentFirehose, DatasourceInputFormat. Fixes #1678. IngestSegmentFirehose (and its users) need to remember which windows of which segments should actually be read, based on a timeline.	2015-08-28 07:32:41 -07:00
Himanshu Gupta	2e0dd1d792	adding UTs and addressing review comments to firehoseV2 addition to Realtime[Manager\|Plumber], essential segment metadata persist support, kafka-simple-consumer-firehose extension patch	2015-08-27 20:50:46 -05:00
lvjq	2237a8cf0f	kafka 8 simple consumer firehose	2015-08-27 20:50:46 -05:00
Charles Allen	e38cf54bc8	Migrate TestDerbyConnector to a JUnit @Rule	2015-08-26 21:47:40 -07:00
Himanshu Gupta	b3c570e78d	update BatchDeltaIngestion.testDeltaIngestion(..) to check for proper glob path handling	2015-08-20 21:36:34 -05:00
Himanshu Gupta	85e3ce9096	split hadoop glob path before adding it to MultipleInputs This can be safely reverted once https://issues.apache.org/jira/browse/MAPREDUCE-5061 is fixed	2015-08-20 21:36:34 -05:00
Himanshu Gupta	a603bd9547	HadoopGlobPathSplitter implementation to split hadoop glob paths This can be safely reverted once https://issues.apache.org/jira/browse/MAPREDUCE-5061 is fixed	2015-08-20 21:36:34 -05:00
Himanshu Gupta	cf3ec8eb46	helpful cause explaining why SegmentDescriptorInfo did not exist	2015-08-19 10:29:04 -05:00
Himanshu Gupta	a3bab5b7d9	IndexGeneratorJobTest type unit test for batch delta ingestion and reindexing	2015-08-16 14:07:35 -05:00
Himanshu Gupta	15fa43dd43	changing DatasourcePathSpec, to get segment list, so that hadoop indexer uses overlord action to get list of segments and passes when running as an overlord task. and, uses metadata store directly when running as standalone hadoop indexer also, serialized list of segments is passed to DatasourcePathSpec so that hadoop classloader issues do not creep up	2015-08-16 14:07:35 -05:00
Himanshu Gupta	45947a1021	add ability to specify Multiple PathSpecs in batch ingestion, so that we can grab data from multiple places in same ingestion Conflicts: indexing-hadoop/src/main/java/io/druid/indexer/HadoopDruidIndexerConfig.java indexing-hadoop/src/main/java/io/druid/indexer/JobHelper.java Conflicts: indexing-hadoop/src/main/java/io/druid/indexer/path/PathSpec.java	2015-08-16 13:15:38 -05:00
Himanshu Gupta	1ae56f139b	Druid Hadoop InputFormat and pathSpec Conflicts: indexing-hadoop/src/main/java/io/druid/indexer/path/PathSpec.java indexing-service/pom.xml	2015-08-16 13:15:38 -05:00
Himanshu Gupta	f1d309a671	do not run parser if value from InputFormat is already an InputRow	2015-08-14 14:44:22 -05:00
Himanshu Gupta	0eec1bbee2	json serde tests for HadoopTuningConfig	2015-07-20 12:01:53 -05:00
Himanshu Gupta	f836c3a7ac	adding flag useCombiner to hadoop tuning config that can be used to add a hadoop combiner to hadoop batch ingestion to do merges on the mappers if possible	2015-07-20 12:01:53 -05:00
Himanshu Gupta	4ef484048a	take control of InputRow serde between Mapper/Reducer in Hadoop Indexing This allows for arbitrary InputFormat while hadoop batch ingestion that can return records of value type other than Text	2015-07-20 12:01:53 -05:00
Himanshu Gupta	f7a92db332	generic byte[] serde for InputRow	2015-07-20 12:01:53 -05:00
Charles Allen	b2bc46be17	Merge pull request #1484 from tubemogul/feature/1463 JobHelper.ensurePaths will set job properties from config (tuningConf…	2015-07-07 10:58:16 -07:00
Michael Schiff	6ad451a44a	JobHelper.ensurePaths will set job properties from config (tuningConfig.jobProperties) before adding input paths to the config. Adding input paths will create Path and FileSystem instances which may depend on the values in the job config. This allows all properties to be set from the spec file, avoiding having to directly edit cluster xml files. IndexGeneratorJob.run adds job properties before adding input paths (adding input paths may depend on having job properies set) JobHelperTest confirms that JobHelper.ensurePaths adds job properties javadoc for addInputPaths to explain relationship with addJobProperties	2015-07-01 12:45:32 -07:00
Davide Anastasia	4a3a7dd1ad	read hadoop-indexer configuration file from HDFS	2015-06-24 14:08:53 -07:00
Hao Xia	1931491c9f	A couple of hdfs related fixes * Class loading issue with hdfs-storage extension * Exception when using hdfs with non-fully qualified segment path	2015-06-19 17:22:20 -07:00
Charles Allen	94a567732a	Wipe FileContext off the face of the earth * Fixes https://github.com/druid-io/druid/issues/1433 * Works arround https://issues.apache.org/jira/browse/HADOOP-10643 * Reverts to the prior method of renaming	2015-06-16 09:48:09 -07:00
Charles Allen	6230ac90ae	Use IndexMerger for conversion	2015-06-10 11:34:58 -07:00
Charles Allen	056cab93ed	Add Hadoop Converter Job and task * Fixes https://github.com/druid-io/druid/issues/1363 * Add extra utils in JobHelper based on PR feedback	2015-06-09 14:47:38 -07:00
Charles Allen	2a76bdc60a	Abstractify hadoopy indexer configuration. * Moves many items to JobHelper * Remove dependencies of these functions on HadoopDruidIndexerConfig in favor of more general items * Changes functionalities of some of the path methods to always return a path with scheme * Adds retry to uploads * Change output loadSpec determining from using outputFS.getClass().getName() to using outputFS.getScheme()	2015-06-08 10:53:27 -07:00
fjy	be2a35188e	Additional schema validations and better logs for common extensions	2015-05-27 16:25:02 -07:00
Xavier Léauté	4466e77b25	Merge pull request #1371 from guobingkun/unit_test Unit test for IndexGeneratorJob	2015-05-22 10:34:24 -04:00
flow	07659f30ab	bug fix: hdfs task log and indexing task not work properly with Hadoop HA	2015-05-21 20:49:42 +08:00
Bingkun Guo	b46aff12ae	Unit test for IndexGeneratorJob	2015-05-18 12:31:16 -05:00
Fangjin Yang	a2dc58cd2d	Merge pull request #1345 from pjain1/unit_test_warn_fix fix warn msg and some unit tests	2015-05-08 08:06:20 -07:00
Parag Jain	01448d264c	Fix warn msg and added some unit tests	2015-05-07 17:10:05 -05:00
fjy	b19435d172	fix typos with batch ingestion in docs	2015-05-07 14:46:17 -07:00
Bingkun Guo	1ee550dd91	Fix a potential issue in DeterminePartitionsJob by making HadoopDruidIndexerConfig non-static, and two unit tests for DeterminPartitionsJob and LocalDataSegmentKiller	2015-05-04 20:00:29 -07:00
Xavier Léauté	3a3046ccf3	add support for dimension compression - compression for single-value dimensions using CompressedVSizeIntsIndexedSupplier - makes dimension compression configurable via IndexSpec - IndexSpec also enables configuring bitmap and metric compression	2015-04-14 10:44:18 -07:00
Prajwal Tuladhar	3044bf5592	use Job.getInstance() to fix deprecated warnings	2015-04-09 13:22:21 -04:00
Xavier Léauté	8b5fa8f85d	always upload SNAPSHOT self-contained jars	2015-04-03 21:18:09 -07:00
Dia Kharrat	3a6dc99384	log invalid rows in mapper of Hadoop indexer	2015-03-19 22:31:04 -07:00
Dia Kharrat	58d5f5e7f0	Honor ignoreInvalidRows in Hadoop indexer The reducer of the hadoop indexer now ignores lines with parsing exceptions (if enabled by the indexer config).	2015-03-19 22:31:04 -07:00
Himanshu Gupta	8c1f0834ba	Removing MapWritableInputRowParser from indexing-hadoop it should really be an extension if user needs	2015-03-19 18:37:08 -05:00
Himanshu Gupta	3f7a7ba5d3	For batch hadoop indexing, make hadoop input format configuration. Given input format must extend from org.apache.hadoop.mapreduce.InputFormat	2015-03-18 16:09:45 -05:00
fjy	bfe10bd156	This fixes arbitrary gran spec breaking	2015-03-17 12:19:43 -07:00
Himanshu Gupta	6a0405de20	fail early if there is no input data for batch hadoop indexing	2015-03-07 12:45:57 -06:00
Himanshu Gupta	30f64ff19e	UTs update for indexing-hadoop	2015-02-25 15:45:57 -08:00
Xavier Léauté	0784d7e30e	Merge pull request #1152 from himanshug/metastorage-pwd-provider support for metadata store PasswordProvider interface	2015-02-25 15:19:37 -08:00
Fangjin Yang	708f35151d	Merge pull request #1121 from gianm/issue-1116 Use the proper FileSystems for writing segments and caching jars. (for issue #1116)	2015-02-25 13:03:59 -08:00
Fangjin Yang	6424815f88	Merge pull request #1097 from metamx/better-hadoop-sort-key Sort HadoopIndexer rows by time+dim bucket to help reduce spilling	2015-02-25 12:49:58 -08:00
Himanshu Gupta	126262edce	support for PasswordProvider interface to enable writing druid extension which can get metadata store password from secured location or anywhere instead of plain text properties file	2015-02-25 14:05:19 -06:00
Himanshu Gupta	01a4f19ea2	removing dependency on NativeS3FileSystem and other file systems	2015-02-23 14:27:50 -06:00
Gian Merlino	fd5a7d1f08	Use the proper FileSystems for writing segments and caching jars. (for issue #1116 )	2015-02-12 16:20:10 -08:00
Xavier Léauté	b1ec7afc12	Sort HadoopIndexer rows by time+dim bucket to help reduce spilling	2015-02-10 14:26:28 -08:00
Fangjin Yang	92e616de11	Merge pull request #1077 from metamx/remove-unused-imports remove unused imports	2015-02-02 10:45:27 -08:00
nishantmonu51	ba932bb1f2	remove unused imports	2015-02-02 21:53:39 +05:30
fjy	d05032b98a	towards a community led druid	2015-01-31 20:57:36 -08:00
Xavier Léauté	cd9635ff5e	Merge pull request #1034 from druid-io/minor-rename minor rename of things in hadoop ingestion config to match 0.6.x	2015-01-15 15:46:13 -08:00
fjy	ccddbf8747	minor rename of things in hadoop ingestion config to match 0.6.x	2015-01-15 14:04:55 -08:00
Fangjin Yang	5bfcc43377	Merge pull request #1008 from metamx/stringConversionJavaUtilUpdate Update all String conversions to and from byte[] to use the java-util StringUtils functions	2015-01-15 13:50:27 -08:00
Fangjin Yang	852e863425	Merge pull request #981 from druid-io/strictModuleTyping Use Module instead of generic Object in Guice related items	2015-01-05 12:43:20 -08:00
Charles Allen	b1b5c9099e	Update all String conversions to and from byte[] to use the java-util StringUtils functions * Speedup of GroupBy with javaScript filters by ~10% * Requires https://github.com/metamx/java-util/pull/15	2015-01-05 11:22:32 -08:00
Xavier Léauté	f1375b0bfb	workaround to pass down bitmap type to map-reduce tasks	2015-01-02 17:29:00 -08:00
Charles Allen	7c8d4a7433	Use Module instead of generic Object in Guice related items	2014-12-19 10:54:06 -08:00
fjy	43d27ddaf0	update http client and fix logging	2014-12-15 16:59:57 -08:00
fjy	e872952390	fix working path default bug	2014-12-15 14:51:58 -08:00
fjy	28b72a69ad	redocumenting ingestion	2014-12-08 16:15:46 -08:00
nishantmonu51	40f223215a	fix buffer pool usage	2014-12-05 16:09:26 +05:30
nishantmonu51	6e03a6245f	Merge branch 'master' into onheap-incremental-index	2014-12-05 10:40:28 +05:30
Xavier Léauté	7cd45a6e1f	IncrementalIndex throws exception if limit exceeded - For now uses a hardcoded ratio of aggregator to timeanddim buffer sizes - canAppendRow is a workaround for realtime index since the Firehose currently does not have a way of rolling back the last event in case of error - canAppendRow needs a fudge factor; there is a race between checking if we can add a row and actually adding a row, because of the way MapDB reports its size.	2014-12-04 14:38:16 -08:00
Gian Merlino	20a7239ffd	Replace google-http-client imports with real guava imports.	2014-12-04 10:57:57 -08:00
Charles Allen	c2add5730b	Fix Hadoop CLI jobs * Change "schema" --> "spec" for cli hadoop to keep up with internal hadoop * Added check for HadoopDruidIndexerConfig deserialization from Map to see if it is trying to get a HadoopDruidIndexerConfig or a HadoopIngestionSpec	2014-12-04 10:57:56 -08:00
xvrl	c867d59ee0	fix error message	2014-12-03 15:30:32 -08:00
Xavier Léauté	2e6c254937	metadata injection not needed for indexing service	2014-12-03 15:09:31 -08:00
Gian Merlino	d388a8fe89	Replace google-http-client imports with real guava imports.	2014-12-03 10:52:57 -08:00
nishantmonu51	4dc0fdba8a	consider mapped size in limit calculation & review comments	2014-12-03 23:47:30 +05:30
nishantmonu51	da8bd7836b	Introduce buffer size	2014-12-03 16:28:22 +05:30
Charles Allen	7cd689be75	Fix Hadoop CLI jobs * Change "schema" --> "spec" for cli hadoop to keep up with internal hadoop * Added check for HadoopDruidIndexerConfig deserialization from Map to see if it is trying to get a HadoopDruidIndexerConfig or a HadoopIngestionSpec	2014-12-02 11:23:04 -08:00
nishantmonu51	eac776f1a7	tests passing with on heap incremental index	2014-12-02 22:29:28 +05:30
Xavier Léauté	59542c41f8	fix port not set in DruidNode	2014-12-01 14:37:28 -08:00
Charles Allen	8b3652a67a	Modify HadoopDruidIndexerConfig to give a port of 0 instead of -1 when binding DruidNode @Self annotation	2014-12-01 14:08:41 -08:00
fjy	fdeab0c6af	make Druid case sensitive	2014-11-19 14:27:31 -08:00
nishantmonu51	f0452c5968	merge from master	2014-11-18 19:34:51 +05:30
nishantmonu51	edf0fc0851	Make hashed partitions spec default - make hashed partitionsSpec as default partitions spec for 0.7	2014-11-17 19:48:12 +05:30
nishantmonu51	0c2d06475d	merge from master	2014-11-17 19:19:18 +05:30
Xavier Léauté	0498df25df	override metadata storage injection in CliHadoopIndexer	2014-11-07 13:44:56 -08:00
Xavier Léauté	50a191425c	fix injection on MetadataStorageUpdaterJob	2014-11-07 11:11:14 -08:00
Xavier Léauté	20a9aef96a	fix test	2014-11-06 17:27:05 -08:00
Xavier Léauté	9c06db021f	rename db->metadata postgres->postgresql	2014-10-31 10:30:27 -07:00
jisookim0513	aa754b86e8	build success!	2014-10-24 11:28:42 -07:00
fjy	bef74104d9	merge with 0.7.x and resolve any conflicts	2014-10-23 17:24:06 -07:00
fjy	d76d57d95d	update docs	2014-10-22 16:16:28 -07:00
jisookim0513	37979282fe	enabled ansi-quote in mysql; insert statement should now work	2014-10-21 00:09:19 -07:00
jisookim0513	7d5c5f2083	fixed createTable; fixed miscellaneous stuff; added DerbyMetadataRuleManagerProvider	2014-10-17 00:10:36 -07:00
nishantmonu51	41e88baeca	Add test for bucket selection	2014-10-15 23:09:28 +05:30
nishantmonu51	f4a97aebbc	fix rollup for hashed partitions truncate timestamp while calculating the partitionNumber	2014-10-15 22:32:56 +05:30
nishantmonu51	b5d66381f3	more cleanup	2014-10-14 18:32:40 +05:30
nishantmonu51	454acd3f5a	remove backwards compatible code 1) remove backwards compatible and deprecated code 2) make hashed partitions spec default	2014-10-13 19:30:44 +05:30
fjy	c7b4d5b7b4	Merge branch 'master' into druid-0.7.x Conflicts: processing/src/test/java/io/druid/segment/filter/SpatialFilterTest.java	2014-10-02 18:12:10 -07:00
nishantmonu51	ad75a21040	separate offheapIncrementalIndex implementation	2014-10-01 13:58:51 +05:30
jisookim0513	9d7b5d9b0f	fixed javadoc; fixed pom files; deleted unnecessary class	2014-09-30 13:47:35 -07:00
nishantmonu51	358ff915bb	fix merge conflicts	2014-09-30 22:19:18 +05:30
nishantmonu51	2789536bed	merge changes from druid-0.7.x	2014-09-30 22:05:49 +05:30
nishantmonu51	61c7fd2e6e	make ingestOffheap tuneable	2014-09-30 15:30:02 +05:30
nishantmonu51	adb4a65e0a	Merge branch 'offheap-incremental-index' into mapdb-branch	2014-09-29 12:38:31 +05:30
jisookim0513	74565c9371	cleaned up the code	2014-09-27 13:10:01 -07:00
jisookim0513	aa887edb73	added two seperate modules for mysql and postgres	2014-09-27 13:08:53 -07:00
flow	2dd62979bb	Fixed the issue of batch ingestion with indexing service to hdfs end up with the path of metadata in mysql missing "hdfs://host" prefix. The detail describe can be found here: https://groups.google.com/forum/#!topic/druid-development/ofvSxiPpCxI	2014-09-27 22:26:52 +08:00
jisookim0513	6a641621b2	finished merging into druid-0.7.x; derby not working (to be fixed)	2014-09-26 14:24:53 -07:00
jisookim0513	43cc6283d3	trying to revert files that have overwritten changes	2014-09-26 12:38:04 -07:00
fjy	eaf0a48b92	Merge branch 'master' into druid-0.7.x Conflicts: cassandra-storage/pom.xml common/pom.xml examples/pom.xml hdfs-storage/pom.xml histogram/pom.xml indexing-hadoop/pom.xml indexing-service/pom.xml kafka-eight/pom.xml kafka-seven/pom.xml pom.xml processing/pom.xml processing/src/main/java/io/druid/guice/PropertiesModule.java rabbitmq/pom.xml s3-extensions/pom.xml server/pom.xml services/pom.xml	2014-09-26 11:39:24 -07:00
jisookim0513	3bf39cc9f8	attempted to fix merge-conflicts	2014-09-24 15:55:42 -07:00

... 2 3 4 5 6 ...

461 Commits