druid

Commit Graph

Author	SHA1	Message	Date
Fokko Driesprong	67920c114e	Fixed info message (#3481 )	2016-09-21 15:50:29 -07:00
Gian Merlino	27bd5cb13a	Add forceExtendableShardSpecs option to Hadoop indexing, IndexTask. (#3473 ) Fixes #3241.	2016-09-21 13:40:04 -06:00
Slim	ba6ddf307e	Adding hadoop kerberos authentification. (#3419 ) * adding kerberos authentication * make the 2 functions identical	2016-09-13 10:42:50 -07:00
Jonathan Wei	df766b2bbd	Add dimension handling interface for ingestion and segment creation (#3217 ) * Add dimension handling interface for ingestion and segment creation * update javadocs for DimensionHandler/DimensionIndexer * Move IndexIO row validation into DimensionHandler * Fix null column skipping in mergerV9 * Add deprecation note for 'numeric_dims' filename pattern in IndexIO v8->v9 conversion * Fix java7 test failure	2016-09-12 12:54:02 -07:00
Himanshu	3b6c81e7c0	fix cleanup of hadoop ingestion intermediate path (#3385 )	2016-09-08 01:36:56 +05:30
Dave Li	c4e8440c22	Adds long compression methods (#3148 ) * add read * update deprecated guava calls * add write and vsizeserde * add benchmark * separate encoding and compression * add header and reformat * update doc * address PR comment * fix buffer order * generate benchmark files * separate encoding strategy and format * fix benchmark * modify supplier write to channel * add float NONE handling * address PR comment * address PR comment 2	2016-08-30 16:17:46 -07:00
Hamlet Lee	e4f0eac8e6	Fix issue #2707 (#2708 )	2016-08-16 12:19:44 -05:00
Gian Merlino	a2bcd97512	IncrementalIndex: Fix multi-value dimensions returned from iterators. (#3344 ) They had arrays as values, which MapBasedRow doesn't understand and toStrings rather than converting to lists.	2016-08-10 08:47:29 -07:00
Gian Merlino	9437a7a313	HLL: Avoid some allocations when possible. (#3314 ) - HLLC.fold avoids duplicating the other buffer by saving and restoring its position. - HLLC.makeCollector(buffer) no longer duplicates incoming BBs. - Updated call sites where appropriate to duplicate BBs passed to HLLC.	2016-08-03 18:08:52 -07:00
kaijianding	50d52a24fc	ability to not rollup at index time, make pre aggregation an option (#3020 ) * ability to not rollup at index time, make pre aggregation an option * rename getRowIndexForRollup to getPriorIndex * fix doc misspelling * test query using no-rollup indexes * fix benchmark fail due to jmh bug	2016-08-02 11:13:05 -07:00
kaijianding	3dc2974894	Add timestampSpec to metadata.drd and SegmentMetadataQuery (#3227 ) * save TimestampSpec in metadata.drd * add timestampSpec info in SegmentMetadataQuery	2016-07-25 15:45:30 -07:00
Navis Ryu	cd7337fc8a	Calculate max split size based on numMapTask in DatasourceInputFormat (#2882 ) * Calculate max split size based on numMapTask * updated docs & fixed possible ArithmeticException	2016-07-20 16:53:51 -07:00
Hyukjin Kwon	55e7a52475	Replace deprecated usage for StringInputRowParser and JSONParseSpec (#3215 )	2016-07-14 09:19:17 -07:00
Gian Merlino	ea03906fcf	Configurable compressRunOnSerialization for Roaring bitmaps. (#3228 ) Defaults to true, which is a change in behavior (this used to be false and unconfigurable).	2016-07-08 10:24:19 +05:30
Hyukjin Kwon	45f553fc28	Replace the deprecated usage of NoneShardSpec (#3166 )	2016-06-25 10:27:25 -07:00
Nishant	2696b0c451	Retry for transient exceptions while doing cleanup for Hadoop Jobs (#3177 ) * fix 1828 fixes https://github.com/druid-io/druid/issues/1828 * remove unused import * Review comment	2016-06-23 13:38:47 -07:00
Nishant	6f330dc816	Better handling for parseExceptions for Batch Ingestion (#3171 ) * Better handling for parseExceptions * make parseException handling consistent with Realtime * change combiner default val to true * review comments * review comments	2016-06-22 16:38:29 -07:00
Gian Merlino	ebf890fe79	Update master version to 0.9.2-SNAPSHOT. (#3133 )	2016-06-13 13:10:38 -07:00
Nishant	778f97a80e	attempt to fix-2906 (#2985 ) * attempt to fix-2984 * review comments * Add test	2016-05-18 15:12:38 -05:00
Charles Allen	15ccf451f9	Move QueryGranularity static fields to QueryGranularities (#2980 ) * Move QueryGranularity static fields to QueryGranularityUtil * Fixes #2979 * Add test showing #2979 * change name to QueryGranularities	2016-05-17 16:23:48 -07:00
David Lim	b489f63698	Supervisor for KafkaIndexTask (#2656 ) * supervisor for kafka indexing tasks * cr changes	2016-05-04 23:13:13 -07:00
Navis Ryu	49ef4d96cb	Merge pull request #2802 from navis/optimize_multiplepath_concat Optimize adding lots of paths to pathspec	2016-04-11 23:35:28 -05:00
jon-wei	0e481d6f93	Allow filters to use extraction functions	2016-04-05 13:24:56 -07:00
Gian Merlino	977e867ad8	Downgrade geoip2, exclude com.google.http-client. Reverts "Update com.maxmind.geoip2 to 2.6.0" and exclude the google http client from com.maxmind.geoip2. This should satisfy the original need from #2646 (wanting to run Druid along with an upgraded com.google.http-client) while preventing Jackson conflicts pointed out in #2717. Fixes #2717. This reverts commit `21b7572533`.	2016-03-25 14:43:22 -07:00
Gian Merlino	ff25325f3b	Improved docs for multi-value dimensions. - Add central doc for multi-value dimensions, with some content from other docs. - Link to multi-value dimension doc from topN and groupBy docs. - Fixes a broken link from dimensionspecs.md, which was presciently already linking to this nonexistent doc. - Resolve inconsistent naming in docs & code (sometimes "multi-valued", sometimes "multi-value") in favor of "multi-value".	2016-03-22 14:40:55 -07:00
Himanshu	00d7021291	Merge pull request #2607 from jon-wei/dim_schema Support use of DimensionSchema class in DimensionsSpec	2016-03-22 11:53:46 -05:00
Himanshu	3220b109ad	Merge pull request #2570 from binlijin/single_dimension_partitioning Single dimension hash-based partitioning	2016-03-22 11:51:06 -05:00
binlijin	bce600f5d5	Single dimension hash-based partitioning	2016-03-22 13:15:33 +08:00
jon-wei	a59c9ee1b1	Support use of DimensionSchema class in DimensionsSpec	2016-03-21 13:12:04 -07:00
Gian Merlino	738dcd8cd9	Update version to 0.9.1-SNAPSHOT. Fixes #2462	2016-03-17 10:34:20 -07:00
Himanshu	ea3281ad78	Merge pull request #2645 from atomx/gs-scheme Add gs:// hdfs support	2016-03-14 22:15:42 -05:00
Erik Dubbelboer	375620cfb3	Add gs:// hdfs support Used to access google cloud storage	2016-03-12 08:57:57 +00:00
Gian Merlino	187569e702	DataSource metadata. Geared towards supporting transactional inserts of new segments. This involves an interface "DataSourceMetadata" that allows combining of partially specified metadata (useful for partitioned ingestion). DataSource metadata is stored in a new "dataSource" table.	2016-03-10 17:41:50 -08:00
Fangjin Yang	1e49092ce7	Merge pull request #2627 from himanshug/fix_datasource_inputformat_locations fix regression - bug in DatasourceInputFormat best effort split location finder code	2016-03-10 13:46:04 -08:00
Himanshu Gupta	eab8a0b54d	in DatasourceInputFormat code for determining segment block locations avoid the split calulation by helper TextInputFormat	2016-03-10 14:28:53 -06:00
Nishant	ba1185963b	Fix a bunch of dependencies * Eliminate exclusion groups from pull-deps * Only consider dependency nodes in pull-deps if they are not in the following scopes * provided * test * system * Fix a bunch of `<scope>provided</scope>` missing tags * Better exclusions for a couple of problematic libs	2016-03-10 10:18:08 -08:00
Bingkun Guo	c20d7682a9	log exceptions correctly in DatasourceInputFormat and IndexGeneratorJob	2016-03-09 13:41:31 -06:00
gaodayue	a6dc3703ca	use ISODataTimeFormat for both hdfs and viewfs schema to support Federationed HDFS	2016-03-08 13:55:05 +08:00
Björn Zettergren	2462c82c0e	New defaults for maxRowsInMemory rowFlushBoundary To bring consistency to docs and source this commit changes the default values for maxRowsInMemory and rowFlushBoundary to 75000 after discussion in PR https://github.com/druid-io/druid/pull/2457. The previous default was 500000 and it's lower now on the grounds that it's better for a default to be somewhat less efficient, and work, than to reach for the stars and possibly result in "OutOfMemoryError: java heap space" errors.	2016-03-01 13:50:28 +01:00
Gian Merlino	3534483433	Better handling of ParseExceptions. Two changes: - Allow IncrementalIndex to suppress ParseExceptions on "aggregate". - Add "reportParseExceptions" option to realtime tuning configs. By default this is "false". Behavior of the counters should now be: - processed: Number of rows indexed, including rows where some fields could be parsed and some could not. - thrownAway: Number of rows thrown away due to rejection policy. - unparseable: Number of rows thrown away due to being completely unparseable (no fields salvageable at all). If "reportParseExceptions" is true then "unparseable" will always be zero (because a parse error would cause an exception to be thrown). In addition, "processed" will only include fully parseable rows (because even partial parse failures will cause exceptions to be thrown). Fixes #2510.	2016-02-23 10:11:43 -08:00
Himanshu Gupta	09ffcae4ae	give user the option to specify the segments for dataSource inputSpec	2016-02-21 23:15:31 -06:00
Himanshu Gupta	2faae9d0d1	In JobHelper.makeSegmentOutputPath(..) use DataSegmentPusherUtils to construct the segment storage path	2016-02-09 21:42:32 -06:00
Himanshu Gupta	b3437825f0	add ignoreWhenNoSegments flag to optionally ignore the dataSource inputSpec when no segments were found	2016-01-26 17:23:55 -06:00
binlijin	cd1c71ceb4	rename persistBackgroundCount to numBackgroundPersistThreads	2016-01-22 14:29:41 +08:00
Charles Allen	2a69a58570	Merge pull request #2149 from binlijin/master Do persist IncrementalIndex in another thread in IndexGeneratorReducer	2016-01-20 17:06:42 -08:00
Fangjin Yang	996c1173c6	Merge pull request #2223 from navis/besteffort-split-locations Best effort to find locations for input splits	2016-01-20 16:53:43 -08:00
Fangjin Yang	695f107870	Merge pull request #2302 from metamx/lowerCaseGranPathTest Make GranularityPathSpecTest check with lower-case enums	2016-01-20 09:18:06 -08:00
Charles Allen	3c5ca3a5f2	Make GranularityPathSpecTest check with lower-case enums	2016-01-20 08:35:13 -08:00
binlijin	8e43e2c446	Do persist IncrementalIndex in another thread in IndexGeneratorReducer	2016-01-20 09:20:09 +08:00
jon-wei	747343e621	Preserve dimension order across indexes during ingestion	2016-01-19 13:34:11 -08:00
Jonathan Wei	df2906a91c	Merge pull request #2290 from gianm/index-merger-v9-stuff Respect buildV9Directly in PlumberSchools, so it works on standalone realtime.	2016-01-19 13:04:00 -08:00
Gian Merlino	1dcf22edb7	Respect buildV9Directly in PlumberSchools, so it works on standalone realtime nodes. Also parameterize some tests to run with/without buildV9Directly: - IndexGeneratorJobTest - RealtimeIndexTaskTest - RealtimePlumberSchoolTest	2016-01-19 12:15:06 -08:00
Himanshu Gupta	164b0aad7a	removing Map<String,Object> segmentMetadata from methods in Index[Maker/Merger] and using Metadata class instead of a Map to store segment metadata	2016-01-18 22:03:46 -06:00
navis.ryu	f03f7fb625	Best effort to find locations for input splits	2016-01-18 08:31:05 +09:00
Kurt Young	82ff98c2bf	add config for build v9 directly and update docs	2016-01-16 11:26:34 +08:00
Kurt Young	1f2168fae5	add IndexMergerV9 add unit tests for IndexMergerV9 and fix some bugs add more unit tests and fix bugs handle null values and add more tests minor changes & use LoggingProgressIndicator in IndexGeneratorReducer make some static class public from IndexMerger minor changes and add some comments changes for comments	2016-01-16 11:25:28 +08:00
navis.ryu	976ebc45c0	Simplify information in IncrementalIndex	2016-01-12 10:18:11 +09:00
dclim	2308c8c07f	continue hadoop job for sparse intervals	2016-01-07 01:35:08 -07:00
fjy	faf421726b	remove IndexMaker	2015-12-28 14:19:02 -08:00
Fangjin Yang	14229ba0f2	Merge pull request #1922 from metamx/jsonIgnoresFinalFields Change DefaultObjectMapper to NOT overwrite final fields unless explicitly asked to	2015-12-18 15:38:32 -08:00
binlijin	219367221b	optimize InputRowSerde	2015-12-09 09:51:56 +08:00
Fangjin Yang	d957a6602c	Merge pull request #2049 from himanshug/hadoop_indexing_unique_path add a unique string to intermediate path for the hadoop indexing task	2015-12-07 11:46:16 -08:00
Himanshu Gupta	6cfaf59d7e	add a unique string to intermediate path for the hadoop indexing task	2015-12-06 22:20:38 -06:00
Himanshu Gupta	62ba9ade37	unifying license header in all java files	2015-12-05 22:16:23 -06:00
Himanshu Gupta	61aaa09012	support multiple intervals in dataSource input spec	2015-12-03 21:28:04 -06:00
Fangjin Yang	21c84b5ff7	Merge pull request #1896 from gianm/allocate-segment SegmentAllocateAction (fixes #1515)	2015-11-18 21:05:46 -08:00
Gian Merlino	e4e5f0375b	SegmentAllocateAction (fixes #1515 ) This is a feature meant to allow realtime tasks to work without being told upfront what shardSpec they should use (so we can potentially publish a variable number of segments per interval). The idea is that there is a "pendingSegments" table in the metadata store that tracks allocated segments. Each one has a segment id (the same segment id we know and love) and is also part of a sequence. The sequences are an idea from @cheddar that offers a way of doing replication. If there are N tasks reading exactly the same data with exactly the same logic (think Kafka tasks reading a fixed range of offsets) then you can place them in the same sequence, and they will generate the same sequence of segments.	2015-11-11 16:54:35 -08:00
Xavier Léauté	fa6142e217	cleanup and remove unused imports	2015-11-11 12:25:21 -08:00
Charles Allen	abae47850a	Add backwards compatability for PR #1922	2015-11-11 10:27:00 -08:00
Gian Merlino	dfbd0e2b60	Merge pull request #1925 from gianm/fix-index-generator Fix reference to INDEX_MAKER in IndexGeneratorJob.	2015-11-06 09:56:30 -08:00
Gian Merlino	75122dc396	Fix reference to INDEX_MAKER in IndexGeneratorJob.	2015-11-06 09:19:58 -08:00
Himanshu Gupta	6bed633121	do not use LoggingProcessIndicator in IndexGeneratorJob because that uses Stopwatch methods from guava not available in older guava versions, this makes the behavior same as LegacyIndexGeneratorJob	2015-11-06 00:40:51 -06:00
Charles Allen	929b981710	Change DefaultObjectMapper to NOT overwrite final fields unless explicitly asked to	2015-11-05 18:10:13 -08:00
Xavier Léauté	223d1ebe9f	fix a very old todo	2015-11-05 13:00:30 -08:00
fjy	8f231fd3e3	cleanup druid codebase	2015-11-04 13:59:53 -08:00
Himanshu Gupta	84f7d8d264	making static final variables in HadoopDruidIndexerConfig upper case	2015-11-02 23:24:26 -06:00
Himanshu Gupta	8b67417ac8	make methods in Index[Merger,Maker,IO] non-static so that they can have appropriate ObjectMapper injected instead of creating one statically	2015-11-02 23:24:26 -06:00
Himanshu Gupta	aeffeaf3e2	fixing hadoop test scope dependencies in indexing-hadoop	2015-10-26 17:09:39 -05:00
Nishant	3641a0e553	Fix Race in jar upload during hadoop indexing - https://github.com/druid-io/druid/issues/582 few fixes delete intermediate file early better exception handling use static pattern instead of compiling it every time Add retry for transient exceptions remove usage of deprecated method. Add test fix imports fix javadoc review comment. review comment: handle crazy snapshot naming review comments remove default retry count in favour of already present constant review comment make random intermediate and final paths. review comment, use temporaryFolder where possible	2015-10-22 21:41:07 +05:30
Xavier Léauté	e4ac78e43d	bump next snapshot to 0.9.0	2015-10-20 13:46:13 -07:00
Xavier Léauté	4c2c7a2c37	update version to 0.8.3	2015-10-14 21:40:55 -07:00
Himanshu Gupta	0368260018	For dataSource inputSpec in hadoop batch ingestion, use configured query granularity for reading existing segments instead of NONE	2015-10-12 22:19:44 -05:00
Gian Merlino	3aba401ee0	SQLMetadataConnector: Retry table creation, in case something goes wrong. Also rejigger table creation methods to not take a DBI. It's already available inside the connector, and everyone was just using that one anyway.	2015-09-24 21:39:36 -07:00
Himanshu Gupta	e8b9ee85a7	HadoopyStringInputRowParser to convert stringy Text, BytesWritable etc into InputRow	2015-09-16 10:58:13 -05:00
Himanshu Gupta	74f4572bd4	Lazily deserialize "parser" to InputRowParser in DataSchema so that user hadoop related InputRowParsers are created only when needed this allows overlord to accept a HadoopIndexTask with a hadoopy InputRowParser and not fail because hadoopy InputRowParser might need hadoop libraries	2015-09-16 10:58:13 -05:00
Himanshu Gupta	9ca6106128	user specified hadoop settings are ignored if explicitly set in code	2015-08-31 10:50:18 -05:00
Gian Merlino	940e1aa3eb	Replace funky imports with standard ones. 1) Lots of Guava imports were not coming from the actual Guava 2) junit.framework.Assert should be org.junit.Assert	2015-08-28 18:02:05 -07:00
jon-wei	e5c4927b14	Add support for parsing BytesWritable strings to Hadoop Indexer	2015-08-28 14:27:14 -07:00
Gian Merlino	414a6fb477	Fix overlapping segments in IngestSegmentFirehose, DatasourceInputFormat. Fixes #1678. IngestSegmentFirehose (and its users) need to remember which windows of which segments should actually be read, based on a timeline.	2015-08-28 07:32:41 -07:00
Himanshu Gupta	2e0dd1d792	adding UTs and addressing review comments to firehoseV2 addition to Realtime[Manager\|Plumber], essential segment metadata persist support, kafka-simple-consumer-firehose extension patch	2015-08-27 20:50:46 -05:00
lvjq	2237a8cf0f	kafka 8 simple consumer firehose	2015-08-27 20:50:46 -05:00
Charles Allen	e38cf54bc8	Migrate TestDerbyConnector to a JUnit @Rule	2015-08-26 21:47:40 -07:00
Himanshu Gupta	b3c570e78d	update BatchDeltaIngestion.testDeltaIngestion(..) to check for proper glob path handling	2015-08-20 21:36:34 -05:00
Himanshu Gupta	85e3ce9096	split hadoop glob path before adding it to MultipleInputs This can be safely reverted once https://issues.apache.org/jira/browse/MAPREDUCE-5061 is fixed	2015-08-20 21:36:34 -05:00
Himanshu Gupta	a603bd9547	HadoopGlobPathSplitter implementation to split hadoop glob paths This can be safely reverted once https://issues.apache.org/jira/browse/MAPREDUCE-5061 is fixed	2015-08-20 21:36:34 -05:00
Himanshu Gupta	cf3ec8eb46	helpful cause explaining why SegmentDescriptorInfo did not exist	2015-08-19 10:29:04 -05:00
Xavier Léauté	3b2e41e42a	update for next release	2015-08-18 17:16:46 -07:00
Himanshu Gupta	a3bab5b7d9	IndexGeneratorJobTest type unit test for batch delta ingestion and reindexing	2015-08-16 14:07:35 -05:00
Himanshu Gupta	15fa43dd43	changing DatasourcePathSpec, to get segment list, so that hadoop indexer uses overlord action to get list of segments and passes when running as an overlord task. and, uses metadata store directly when running as standalone hadoop indexer also, serialized list of segments is passed to DatasourcePathSpec so that hadoop classloader issues do not creep up	2015-08-16 14:07:35 -05:00
Himanshu Gupta	45947a1021	add ability to specify Multiple PathSpecs in batch ingestion, so that we can grab data from multiple places in same ingestion Conflicts: indexing-hadoop/src/main/java/io/druid/indexer/HadoopDruidIndexerConfig.java indexing-hadoop/src/main/java/io/druid/indexer/JobHelper.java Conflicts: indexing-hadoop/src/main/java/io/druid/indexer/path/PathSpec.java	2015-08-16 13:15:38 -05:00

1 2 3 4 5 ...

852 Commits