druid

Commit Graph

Author	SHA1	Message	Date
T R Kyaw	d6179126ed	Allow index job to utilize hadoop cluster information from job config. (#4626 ) * Allow ndex job to utilize hadoop cluster information from job config. * Add new method that inject system configuration and then job configuration. * Make changes to use HadoopDruidIndexerConfig.addJobProperties method. * refactor code for overloaded addJobProperties.	2017-08-30 16:44:33 -05:00
Gian Merlino	daf3c5f927	Add "round" option to cardinality and hyperUnique aggregators. (#4720 ) * Add "round" option to cardinality and hyperUnique aggregators. Also turn it on by default in SQL, to make math on distinct counts work more as expected. * Fix some compile errors. * Fix test. * Formatting.	2017-08-28 14:52:11 -07:00
Roman Leventov	cbd1902db8	Add forbidden-apis plugin; prohibit using system time zone (#4611 ) * Forbidden APIs WIP * Remove some tests * Restore io.druid.math.expr.Function * Integration tests fix * Add comments * Fix in SimpleWorkerProvisioningStrategy * Formatting * Replace String.format() with StringUtils.format() in RemoteTaskRunnerTest * Address comments * Fix GroupByMultiSegmentTest	2017-08-21 13:02:42 -07:00
Jihoon Son	d5606bc558	Passing lockTimeout as a parameter for TaskLockbox.lock() (#4549 ) * Passing lockTimeout as a parameter for TaskLockbox.lock() * Remove TIME_UNIT * Fix tc fail * Add taskLockTimeout to TaskContext * Add caution	2017-08-08 18:21:07 -07:00
Roman Leventov	f5d4171459	Prohibit for loops which could be foreach with IntelliJ (#4653 ) * Replace for with foreach * Replace for with for-each in GroupByQueryEngineV2 * Remove io.druid.collections.IntList	2017-08-08 18:05:33 -07:00
Roman Leventov	aa7e4ae5e4	Enforce correct spacing with Checkstyle (#4651 )	2017-08-05 10:18:25 -07:00
Yuewen Wang	b154ff0d4a	[Bugfix] null string in map/reduce jvm opts (#4588 ) fix a bug when the cluster don't configure mapreduce.map.java.opts or mapreduce.map.java.opts	2017-07-22 06:22:30 -07:00
Roman Leventov	c0beb78ffd	Enforce brace formatting with Checkstyle (#4564 )	2017-07-21 10:26:59 -05:00
Slim	71e7a4c054	Adding double colums supports (#4491 ) * add double columns support * Fix numbers and expected results in UTs * adding float aggregators * fix IT expected test results * fix comments * more fixes * fix comp * fix test * refactor double and float aggregator factories * fix * fix UTs * fix comments * clean unused code * fix more comments * undo unnecessary changes * fix null issue * refactor TopNColumnSelectorStrategyFactory * fix docs * refactor NumericTopNColumnSelectorStrategy * fix return * fix comments * handle the null case in DimesionIndexer * more null fixing * cosmetic changes	2017-07-20 10:14:14 +03:00
Gian Merlino	441ee56ba9	DataSegmentPusher: Add allowed hadoop property prefixes. (#4562 ) * DataSegmentPusher: Add allowed hadoop property prefixes. * Fix dots.	2017-07-18 10:16:12 -07:00
Roman Leventov	60cdf94677	Add PMD and prohibit unnecessary fully qualified class names in code (#4350 ) * Add PMD and prohibit unnecessary fully qualified class names in code * Extra fixes * Remove extra unnecessary fully-qualified names * Remove qualifiers * Remove qualifier	2017-07-17 22:22:29 +09:00
Slim	c5c17bb803	Fix issue 4536 suggested by @@erikdubbelboer (#4541 )	2017-07-13 14:14:31 -07:00
Parag Jain	6e2f78f552	TLS support (#4270 )	2017-07-06 17:40:12 -07:00
Roman Leventov	9ae457f7ad	Avoid using the default system Locale and printing to System.out in production code (#4409 ) * Avoid usages of Default system Locale and printing to System.out or System.err in production code * Fix Charset in DruidKerberosUtil * Remove redundant string format in GenericIndexed * Rename StringUtils.safeFormat() to unimportantSafeFormat(); add StringUtils.format() which fails as well as String.format() * Fix testSafeFormat() * More fixes of redundant StringUtils.format() inside ISE * Rename unimportantSafeFormat() to nonStrictFormat()	2017-06-29 14:06:19 -07:00
Roman Leventov	05d58689ad	Remove the ability to create segments in v8 format (#4420 ) * Remove ability to create segments in v8 format * Fix IndexGeneratorJobTest * Fix parameterized test name in IndexMergerTest * Remove extra legacy merging stuff * Remove legacy serializer builders * Remove ConciseBitmapIndexMergerTest and RoaringBitmapIndexMergerTest	2017-06-26 13:21:39 -07:00
Goh Wei Xiang	f68a0693f3	Allow use of non-threadsafe ObjectCachingColumnSelectorFactory (#4397 ) * Adding a flag to indicate when ObjectCachingColumnSelectorFactory need not be threadsafe. * - Use of computeIfAbsent over putIfAbsent - Replace Maps.newXXXMap() with normal instantiation - Documentations on when is thread-safe required. - Use Builders for On/OffheapIncrementalIndex * - Optimization on computeIfAbsent - Constant EMPTY DimensionsSpec - Improvement on IncrementalIndexSchema.Builder - Remove setting of default values - Use var args for metrics - Correction on On/OffheapIncrementalIndex Builders - Combine On/OffheapIncrementalIndex Builders * - Removing unused imports. * - Helper method for testing with IncrementalIndex.Builder * - Correction on javadoc. * Style fix	2017-06-16 16:04:19 -05:00
Gian Merlino	1f2afccdf8	Expressions: Add ExprMacros. (#4365 ) * Expressions: Add ExprMacros, which have the same syntax as functions, but can convert themselves to any kind of Expr at parse-time. ExprMacroTable is an extension point for adding new ExprMacros. Anything that might need to parse expressions needs an ExprMacroTable, which can be injected through Guice. * Address code review comments.	2017-06-08 09:32:10 -04:00
Roman Leventov	63a897c278	Enable most IntelliJ 'Probable bugs' inspections (#4353 ) * Enable most IntelliJ 'Probable bugs' inspections * Fix in RemoteTestNG * Fix IndexSpec's equals() and hashCode() to include longEncoding * Fix inspection errors * Extract global isntance of natural().nullsFirst(); address comments * Fix * Use noinspection comments instead of SuppressWarnings on method for IntelliJ-specific inspections * Prohibit Ordering.natural().nullsFirst() using Checkstyle	2017-06-07 09:54:25 -07:00
Himanshu	4ace65a2af	fix NPE in IndexGeneratorJob (#4371 ) * fix NPE in IndexGeneratorJob * address review comment * review comments	2017-06-07 05:54:03 -07:00
Roman Leventov	31d33b333e	Make using implicit system Charset an error (#4326 ) * Make using implicit system charset an error * Use StringUtils.toUtf8() and fromUtf8() instead of String.getBytes() and new String() * Use English locale in StringUtils.safeFormat() * Restore comment	2017-06-05 23:57:25 -07:00
Slim	a2584d214a	Delagate creation of segmentPath/LoadSpec to DataSegmentPushers and add S3a support (#4116 ) * Adding s3a schema and s3a implem to hdfs storage module. * use 2.7.3 * use segment pusher to make loadspec * move getStorageDir and makeLoad spec under DataSegmentPusher * fix uts * fix comment part1 * move to hadoop 2.8 * inject deep storage properties * set version to 2.7.3 * fix build issue about static class * fix comments * fix default hadoop default coordinate * fix create filesytem * downgrade aws sdk * bump the version	2017-06-04 00:55:09 -06:00
Goh Wei Xiang	b77fab8a30	Replace usages of CountingMap with Object2LongMap (#4320 ) * Replaces use of CountingMap with Object2LongMap from fastutil. * Remove CountingMap classes and minor fixes * Added additional test cases for DatasourceInputFormat. * Added additional test cases for CoordinatorStats. * Not materializing segment list. * Put in this fix because it is failing the test on its expected behavior. * Added missing header.	2017-05-24 17:40:32 -07:00
Roman Leventov	b7a52286e8	Make @Override annotation obligatory (#4274 ) * Make MissingOverride an error * Make travis stript to fail fast * Add missing Override annotations * Comment	2017-05-16 13:30:30 -05:00
Benedict Jin	e823085866	Improve `collection` related things that reusing a immutable object instead of creating a new object (#4135 )	2017-05-17 01:38:51 +09:00
Jihoon Son	50a4ec2b0b	Add support for headers and skipping thereof for CSV and TSV (#4254 ) * initial commit * small fixes * fix bug * fix bug * address code review * more cr * more cr * more cr * fix * Skip head rows for CSV and TSV * Move checking skipHeadRows to FileIteratingFirehose * Remove checking null iterators * Remove unused imports * Address comments * Fix compilation error * Address comments * Add more tests * Add a comment to ReplayableFirehose * Addressing comments * Add docs and fix typos	2017-05-15 22:57:31 -07:00
Roman Leventov	1ebfa22955	Update Error prone configuration; Fix bugs (#4252 ) * Make Errorprone the default compiler * Address comments * Make Error Prone's ClassCanBeStatic rule a error * Preconditions allow only %s pattern * Fix DruidCoordinatorBalancerTester * Try to give the compiler more memory * Remove distribution module activation on jdk 1.8 because only jdk 1.8 is used now * Don't show compiler warnings * Try different travis script * Fix travis.yml * Make Error Prone optional again * For error-prone compiler * Increase compiler's maxmem * Don't run Error Prone for benchmarks because of OOM * Skip install step in Travis * Remove MetricHolder.writeToChannel() * In travis.yml, check compilation before tests, because it may fail faster	2017-05-12 15:55:17 +09:00
Pierre	bba31e0c8b	close aggregators in indexing-hadoop mappers (#4251 )	2017-05-05 08:29:13 -07:00
Pierre	e9872f0695	do not flush on closed stream (#4250 )	2017-05-05 09:19:20 +09:00
Roman Leventov	8277284d67	Add Checkstyle rule to force comments to classes and methods to be Javadoc comments (#4239 )	2017-05-04 11:14:41 -07:00
Gian Merlino	97ddb38d75	DatasourceInputSplit: Serialize with write instead of writeUTF. (#4195 ) writeUTF has a limit of 64KB, making it difficult to write out splits that read a large number of descriptors for small segments.	2017-04-25 10:26:44 -07:00
Gian Merlino	b4289c0004	Remove "granularity" from IngestSegmentFirehose. (#4110 ) It wasn't doing anything useful (the sequences were being concatted, and cursor.getTime() wasn't being called) and it defaulted to Granularities.NONE. Changing it to Granularities.ALL gave me a 700x+ performance boost on a small dataset I was reindexing (2m27s to 365ms). Most of that was from avoiding making a lot of unnecessary column selectors.	2017-03-24 10:28:54 -07:00
Roman Leventov	81a5f9851f	TmpFileIOPeons to create files under the merging output directory, instead of java.io.tmpdir (#3990 ) * In IndexMerger and IndexMergerV9, create temporary files under the output directory/tmpPeonFiles, instead of java.io.tmpdir * Use FileUtils.forceMkdir() across the codebase and remove some unused code * Fix test * Fix PullDependencies.run() * Unused import	2017-03-02 14:05:12 -08:00
Akash Dwivedi	94da5e80f9	Namespace optimization for hdfs data segments. (#3877 ) * NN optimization for hdfs data segments. * HdfsDataSegmentKiller, HdfsDataSegment finder changes to use new storage format.Docs update. * Common utility function in DataSegmentPusherUtil. * new static method `makeSegmentOutputPathUptoVersionForHdfs` in JobHelper * reuse getHdfsStorageDirUptoVersion in DataSegmentPusherUtil.getHdfsStorageDir() * Addressed comments. * Review comments. * HdfsDataSegmentKiller requested changes. * extra newline * Add maprfs.	2017-03-01 09:51:20 -08:00
praveev	5ccfdcc48b	Fix testDeadlock timeout delay (#3979 ) * No more singleton. Reduce iterations * Granularities * Fix the delay in the test * Add license header * Remove unused imports * Lot more unused imports from all the rearranging * CR feedback * Move javadoc to constructor	2017-02-28 12:51:41 -06:00
praveev	c3bf40108d	One granularity (#3850 ) * Refactor Segment Granularity * Beginning of one granularity * Copy the fix for custom periods in segment-grunalrity over here. * Remove the custom serialization for now. * Compilation cleanup * Reformat code * Fixing unit tests * Unify to use a single iterable * Backward compatibility for rolling upgrade * Minor check style. Cosmetic changes. * Rename length and millis to duration * CR feedback * Minor changes.	2017-02-25 01:02:29 -06:00
Akash Dwivedi	797488a677	Removing Integer.MAX column size limit. (#3743 ) * Removing Integer.MAX column size limit. * On demand creation of headerLong, use v2 instead of v3 * Avoid reusing the same object from a previous test. * Avoid reusing the same object from a previous test part#2 * code formatting. * GenericIndexed/Writer code review changes. * GenericIndexed/writer code review requested changes. * checkIndex() to static * native endianess for genericIndexedV2, code review requested changes. * Formatting * Hll fix. * use native endianess during bag size calculation. * Code review requested changes. * IOPeon close() changes. * use different tmp directory path for testing. * Code review requested changes.	2017-02-16 20:09:43 -06:00
Akash Dwivedi	8854ce018e	File.deleteOnExit() (#3923 ) * Less use of File.deleteOnExit() * removed deleteOnExit from most of the tests/benchmarks/iopeon * Made IOpeon closable * Formatting. * Revert DeterminePartitionsJobTest, remove cleanup method from IOPeon	2017-02-13 15:12:14 -08:00
baruchz	b7a88706f3	Add maprfs scheme (#3920 ) Add maprfs scheme to JobHelper to treated as HDFS deep storage	2017-02-12 18:37:58 -08:00
DaimonPl	93b71e265e	Extract HLL related code to separate module (#3900 )	2017-02-03 09:45:11 -08:00
Gian Merlino	d3a3b7ba0c	Add virtual column types, holder serde, and safety features. (#3823 ) * Add virtual column types, holder serde, and safety features. Virtual columns: - add long, float, dimension selectors - put cache IDs in VirtualColumnCacheHelper - adjust serde so VirtualColumns can be the holder object for Jackson - add fail-fast validation for cycle detection and duplicates - add expression virtual column in core Storage adapters: - move virtual column hooks before checking base columns, to prevent surprises when a new base column is added that happens to have the same name as a virtual column. * Fix ExtractionDimensionSpecs with virtual dimensions. * Fix unused imports. * CR comments * Merge one more time, with feeling.	2017-01-26 18:15:51 -08:00
Jihoon Son	d80bec83cc	Enable auto license checking (#3836 ) * Enable license checking * Clean duplicated license headers	2017-01-10 18:13:47 -08:00
Himanshu	7ced0e8759	log sizes of created smoosh files (#3817 ) * log when merging of intermediate segments starts during batch ingestion * log sizes of created smoosh files	2017-01-04 16:52:22 -08:00
Gian Merlino	d8702ebece	Filters: Use ColumnSelectorFactory directly for building row-based matchers. (#3797 ) * Filters: Use ColumnSelectorFactory directly for building row-based matchers. * Adjustments based on code review. - BoundDimFilter: fewer volatiles, rename matchesAnything to !matchesNothing. - HavingSpecs: Clarify that they are not thread-safe, and make DimFilterHavingSpec not thread safe. - Renamed rowType to rowSignature. - Added specializations for time-based vs non-time-based DimensionSelector in RBCSF. - Added convenience method DimensionHanderUtils.createColumnSelectorPlus. - Added singleton ZeroIndexedInts. - Added test cases for DimFilterHavingSpec. * Make ValueMatcherColumnSelectorStrategy actually use the associated selector. * Add RangeIndexedInts. * DimFilterHavingSpec: Fix concurrent usage guard on jdk7. * Add assertion to ZeroIndexedInts. * Rename no-longer-volatile members.	2017-01-03 14:30:22 -08:00
Erik Dubbelboer	c0c34f82ad	Fix reindexing of segments in Google Cloud Storage (#3788 ) Google Cloud Storage allows `:` in paths. For this reason `google` was not added to `da007ca3c2/indexing-hadoop/src/main/java/io/druid/indexer/JobHelper.java (L585)` Normally this is not an issue but when reindexing segments the Hadoop code for `getSplits` trips up on the `:` and returns: `Relative path in absolute URI` This patch URL encodes the `:` character making it work in Hadoop as well.	2016-12-20 17:16:33 -08:00
Jonathan Wei	880a021a7a	Fix missed travis failures from PR 3567 and 2798 (#3761 ) * Fix checkstyle failures from PR 3567 * Fix GranularityPathSpecTest compile failure	2016-12-07 19:07:31 -08:00
Navis Ryu	f794246ec1	Trimming out outside of given interval (#2798 ) * Trimming out outside of given interval (Fix for #2659) * addressed comments	2016-12-07 18:05:50 -08:00
Roman Leventov	c070b4a816	Fix concurrency defects, remove unnecessary volatiles (#3701 )	2016-11-22 16:42:28 -08:00
Erik Dubbelboer	7d36f540e8	WIP: Add Google Storage support (#2458 ) Also excludes the correct artifacts from #2741	2016-11-16 14:06:45 +05:30
Gian Merlino	bcd20441be	Make buildV9Directly the default. (#3688 )	2016-11-14 09:29:32 -08:00
praveev	52a74cf84f	Use timestamp in millis as Map key instead of DateTime object (#3674 ) * Use Long timestamp as key instead of DateTime. DateTime representation is screwed up when you store with an obj and read with a different DateTime obj. For example: The code below fails when you use DateTime as key ``` DateTime odt = DateTime.now(DateTimeUtils.getZone(DateTimeZone.forID("America/Los_Angeles"))); HashMap<DateTime, String> map = new HashMap<>(); map.put(odt, "abc"); DateTime dt = new DateTime(odt.getMillis()); System.out.println(map.get(dt)); ``` * Respect timezone when creating the file. * Update docs with timezone caveat in granularity spec * Remove unused imports	2016-11-11 10:20:20 -08:00
Himanshu	b76b3f8d85	reset-cluster command to clean up druid state stored on metadata and deep storage (#3670 )	2016-11-09 11:07:01 -06:00
Gian Merlino	89d9c61894	Deprecate Aggregator.getName and AggregatorFactory.getAggregatorStartValue. (#3572 )	2016-10-31 15:24:30 -07:00
Akash Dwivedi	6a845e1f7b	Adding getDelegate() to directly access delegate. (#3616 ) 👍	2016-10-27 15:57:36 -07:00
Akash Dwivedi	4b3bd8bd63	Migrating java-util from Metamarkets. (#3585 ) * Migrating java-util from Metamarkets. * checkstyle and updated license on java-util files. * Removed unused imports from whole project. * cherry pick metamx/java-util@826021f. * Copyright changes on java-util pom, address review comments.	2016-10-21 14:57:07 -07:00
Gian Merlino	dd0bb6da1e	Unit test for #3544 : Avoid exceptions for dataSource spec when using s3. (#3571 )	2016-10-17 12:41:43 -07:00
Navis Ryu	4554c1214b	Avoid exceptions for dataSource spec when using s3 (#3544 )	2016-10-14 18:24:19 -07:00
Akash Dwivedi	078de4fcf9	Use explicit version from HadoopIngestionSpec. (#3554 )	2016-10-07 13:59:14 -07:00
praveev	43cdc675c7	Add support for timezone in segment granularity (#3528 ) * Add support for timezone in segment granularity * CR feedback. Handle null timezone during equals check. * Include timezone in docs. Add timezone for ArbitraryGranularitySpec.	2016-10-03 08:15:42 -07:00
Fokko Driesprong	67920c114e	Fixed info message (#3481 )	2016-09-21 15:50:29 -07:00
Gian Merlino	27bd5cb13a	Add forceExtendableShardSpecs option to Hadoop indexing, IndexTask. (#3473 ) Fixes #3241.	2016-09-21 13:40:04 -06:00
Slim	ba6ddf307e	Adding hadoop kerberos authentification. (#3419 ) * adding kerberos authentication * make the 2 functions identical	2016-09-13 10:42:50 -07:00
Jonathan Wei	df766b2bbd	Add dimension handling interface for ingestion and segment creation (#3217 ) * Add dimension handling interface for ingestion and segment creation * update javadocs for DimensionHandler/DimensionIndexer * Move IndexIO row validation into DimensionHandler * Fix null column skipping in mergerV9 * Add deprecation note for 'numeric_dims' filename pattern in IndexIO v8->v9 conversion * Fix java7 test failure	2016-09-12 12:54:02 -07:00
Himanshu	3b6c81e7c0	fix cleanup of hadoop ingestion intermediate path (#3385 )	2016-09-08 01:36:56 +05:30
Dave Li	c4e8440c22	Adds long compression methods (#3148 ) * add read * update deprecated guava calls * add write and vsizeserde * add benchmark * separate encoding and compression * add header and reformat * update doc * address PR comment * fix buffer order * generate benchmark files * separate encoding strategy and format * fix benchmark * modify supplier write to channel * add float NONE handling * address PR comment * address PR comment 2	2016-08-30 16:17:46 -07:00
Hamlet Lee	e4f0eac8e6	Fix issue #2707 (#2708 )	2016-08-16 12:19:44 -05:00
Gian Merlino	a2bcd97512	IncrementalIndex: Fix multi-value dimensions returned from iterators. (#3344 ) They had arrays as values, which MapBasedRow doesn't understand and toStrings rather than converting to lists.	2016-08-10 08:47:29 -07:00
Gian Merlino	9437a7a313	HLL: Avoid some allocations when possible. (#3314 ) - HLLC.fold avoids duplicating the other buffer by saving and restoring its position. - HLLC.makeCollector(buffer) no longer duplicates incoming BBs. - Updated call sites where appropriate to duplicate BBs passed to HLLC.	2016-08-03 18:08:52 -07:00
kaijianding	50d52a24fc	ability to not rollup at index time, make pre aggregation an option (#3020 ) * ability to not rollup at index time, make pre aggregation an option * rename getRowIndexForRollup to getPriorIndex * fix doc misspelling * test query using no-rollup indexes * fix benchmark fail due to jmh bug	2016-08-02 11:13:05 -07:00
kaijianding	3dc2974894	Add timestampSpec to metadata.drd and SegmentMetadataQuery (#3227 ) * save TimestampSpec in metadata.drd * add timestampSpec info in SegmentMetadataQuery	2016-07-25 15:45:30 -07:00
Navis Ryu	cd7337fc8a	Calculate max split size based on numMapTask in DatasourceInputFormat (#2882 ) * Calculate max split size based on numMapTask * updated docs & fixed possible ArithmeticException	2016-07-20 16:53:51 -07:00
Hyukjin Kwon	55e7a52475	Replace deprecated usage for StringInputRowParser and JSONParseSpec (#3215 )	2016-07-14 09:19:17 -07:00
Gian Merlino	ea03906fcf	Configurable compressRunOnSerialization for Roaring bitmaps. (#3228 ) Defaults to true, which is a change in behavior (this used to be false and unconfigurable).	2016-07-08 10:24:19 +05:30
Hyukjin Kwon	45f553fc28	Replace the deprecated usage of NoneShardSpec (#3166 )	2016-06-25 10:27:25 -07:00
Nishant	2696b0c451	Retry for transient exceptions while doing cleanup for Hadoop Jobs (#3177 ) * fix 1828 fixes https://github.com/druid-io/druid/issues/1828 * remove unused import * Review comment	2016-06-23 13:38:47 -07:00
Nishant	6f330dc816	Better handling for parseExceptions for Batch Ingestion (#3171 ) * Better handling for parseExceptions * make parseException handling consistent with Realtime * change combiner default val to true * review comments * review comments	2016-06-22 16:38:29 -07:00
Nishant	778f97a80e	attempt to fix-2906 (#2985 ) * attempt to fix-2984 * review comments * Add test	2016-05-18 15:12:38 -05:00
Charles Allen	15ccf451f9	Move QueryGranularity static fields to QueryGranularities (#2980 ) * Move QueryGranularity static fields to QueryGranularityUtil * Fixes #2979 * Add test showing #2979 * change name to QueryGranularities	2016-05-17 16:23:48 -07:00
David Lim	b489f63698	Supervisor for KafkaIndexTask (#2656 ) * supervisor for kafka indexing tasks * cr changes	2016-05-04 23:13:13 -07:00
Navis Ryu	49ef4d96cb	Merge pull request #2802 from navis/optimize_multiplepath_concat Optimize adding lots of paths to pathspec	2016-04-11 23:35:28 -05:00
jon-wei	0e481d6f93	Allow filters to use extraction functions	2016-04-05 13:24:56 -07:00
Gian Merlino	977e867ad8	Downgrade geoip2, exclude com.google.http-client. Reverts "Update com.maxmind.geoip2 to 2.6.0" and exclude the google http client from com.maxmind.geoip2. This should satisfy the original need from #2646 (wanting to run Druid along with an upgraded com.google.http-client) while preventing Jackson conflicts pointed out in #2717. Fixes #2717. This reverts commit `21b7572533`.	2016-03-25 14:43:22 -07:00
Gian Merlino	ff25325f3b	Improved docs for multi-value dimensions. - Add central doc for multi-value dimensions, with some content from other docs. - Link to multi-value dimension doc from topN and groupBy docs. - Fixes a broken link from dimensionspecs.md, which was presciently already linking to this nonexistent doc. - Resolve inconsistent naming in docs & code (sometimes "multi-valued", sometimes "multi-value") in favor of "multi-value".	2016-03-22 14:40:55 -07:00
Himanshu	00d7021291	Merge pull request #2607 from jon-wei/dim_schema Support use of DimensionSchema class in DimensionsSpec	2016-03-22 11:53:46 -05:00
binlijin	bce600f5d5	Single dimension hash-based partitioning	2016-03-22 13:15:33 +08:00
jon-wei	a59c9ee1b1	Support use of DimensionSchema class in DimensionsSpec	2016-03-21 13:12:04 -07:00
Himanshu	ea3281ad78	Merge pull request #2645 from atomx/gs-scheme Add gs:// hdfs support	2016-03-14 22:15:42 -05:00
Erik Dubbelboer	375620cfb3	Add gs:// hdfs support Used to access google cloud storage	2016-03-12 08:57:57 +00:00
Gian Merlino	187569e702	DataSource metadata. Geared towards supporting transactional inserts of new segments. This involves an interface "DataSourceMetadata" that allows combining of partially specified metadata (useful for partitioned ingestion). DataSource metadata is stored in a new "dataSource" table.	2016-03-10 17:41:50 -08:00
Himanshu Gupta	eab8a0b54d	in DatasourceInputFormat code for determining segment block locations avoid the split calulation by helper TextInputFormat	2016-03-10 14:28:53 -06:00
Bingkun Guo	c20d7682a9	log exceptions correctly in DatasourceInputFormat and IndexGeneratorJob	2016-03-09 13:41:31 -06:00
gaodayue	a6dc3703ca	use ISODataTimeFormat for both hdfs and viewfs schema to support Federationed HDFS	2016-03-08 13:55:05 +08:00
Björn Zettergren	2462c82c0e	New defaults for maxRowsInMemory rowFlushBoundary To bring consistency to docs and source this commit changes the default values for maxRowsInMemory and rowFlushBoundary to 75000 after discussion in PR https://github.com/druid-io/druid/pull/2457. The previous default was 500000 and it's lower now on the grounds that it's better for a default to be somewhat less efficient, and work, than to reach for the stars and possibly result in "OutOfMemoryError: java heap space" errors.	2016-03-01 13:50:28 +01:00
Gian Merlino	3534483433	Better handling of ParseExceptions. Two changes: - Allow IncrementalIndex to suppress ParseExceptions on "aggregate". - Add "reportParseExceptions" option to realtime tuning configs. By default this is "false". Behavior of the counters should now be: - processed: Number of rows indexed, including rows where some fields could be parsed and some could not. - thrownAway: Number of rows thrown away due to rejection policy. - unparseable: Number of rows thrown away due to being completely unparseable (no fields salvageable at all). If "reportParseExceptions" is true then "unparseable" will always be zero (because a parse error would cause an exception to be thrown). In addition, "processed" will only include fully parseable rows (because even partial parse failures will cause exceptions to be thrown). Fixes #2510.	2016-02-23 10:11:43 -08:00
Himanshu Gupta	09ffcae4ae	give user the option to specify the segments for dataSource inputSpec	2016-02-21 23:15:31 -06:00
Himanshu Gupta	2faae9d0d1	In JobHelper.makeSegmentOutputPath(..) use DataSegmentPusherUtils to construct the segment storage path	2016-02-09 21:42:32 -06:00
Himanshu Gupta	b3437825f0	add ignoreWhenNoSegments flag to optionally ignore the dataSource inputSpec when no segments were found	2016-01-26 17:23:55 -06:00
binlijin	cd1c71ceb4	rename persistBackgroundCount to numBackgroundPersistThreads	2016-01-22 14:29:41 +08:00
Charles Allen	2a69a58570	Merge pull request #2149 from binlijin/master Do persist IncrementalIndex in another thread in IndexGeneratorReducer	2016-01-20 17:06:42 -08:00
Fangjin Yang	996c1173c6	Merge pull request #2223 from navis/besteffort-split-locations Best effort to find locations for input splits	2016-01-20 16:53:43 -08:00
Fangjin Yang	695f107870	Merge pull request #2302 from metamx/lowerCaseGranPathTest Make GranularityPathSpecTest check with lower-case enums	2016-01-20 09:18:06 -08:00
Charles Allen	3c5ca3a5f2	Make GranularityPathSpecTest check with lower-case enums	2016-01-20 08:35:13 -08:00
binlijin	8e43e2c446	Do persist IncrementalIndex in another thread in IndexGeneratorReducer	2016-01-20 09:20:09 +08:00
jon-wei	747343e621	Preserve dimension order across indexes during ingestion	2016-01-19 13:34:11 -08:00
Jonathan Wei	df2906a91c	Merge pull request #2290 from gianm/index-merger-v9-stuff Respect buildV9Directly in PlumberSchools, so it works on standalone realtime.	2016-01-19 13:04:00 -08:00
Gian Merlino	1dcf22edb7	Respect buildV9Directly in PlumberSchools, so it works on standalone realtime nodes. Also parameterize some tests to run with/without buildV9Directly: - IndexGeneratorJobTest - RealtimeIndexTaskTest - RealtimePlumberSchoolTest	2016-01-19 12:15:06 -08:00
Himanshu Gupta	164b0aad7a	removing Map<String,Object> segmentMetadata from methods in Index[Maker/Merger] and using Metadata class instead of a Map to store segment metadata	2016-01-18 22:03:46 -06:00
navis.ryu	f03f7fb625	Best effort to find locations for input splits	2016-01-18 08:31:05 +09:00
Kurt Young	82ff98c2bf	add config for build v9 directly and update docs	2016-01-16 11:26:34 +08:00
Kurt Young	1f2168fae5	add IndexMergerV9 add unit tests for IndexMergerV9 and fix some bugs add more unit tests and fix bugs handle null values and add more tests minor changes & use LoggingProgressIndicator in IndexGeneratorReducer make some static class public from IndexMerger minor changes and add some comments changes for comments	2016-01-16 11:25:28 +08:00
navis.ryu	976ebc45c0	Simplify information in IncrementalIndex	2016-01-12 10:18:11 +09:00
dclim	2308c8c07f	continue hadoop job for sparse intervals	2016-01-07 01:35:08 -07:00
fjy	faf421726b	remove IndexMaker	2015-12-28 14:19:02 -08:00
Fangjin Yang	14229ba0f2	Merge pull request #1922 from metamx/jsonIgnoresFinalFields Change DefaultObjectMapper to NOT overwrite final fields unless explicitly asked to	2015-12-18 15:38:32 -08:00
binlijin	219367221b	optimize InputRowSerde	2015-12-09 09:51:56 +08:00
Fangjin Yang	d957a6602c	Merge pull request #2049 from himanshug/hadoop_indexing_unique_path add a unique string to intermediate path for the hadoop indexing task	2015-12-07 11:46:16 -08:00
Himanshu Gupta	6cfaf59d7e	add a unique string to intermediate path for the hadoop indexing task	2015-12-06 22:20:38 -06:00
Himanshu Gupta	62ba9ade37	unifying license header in all java files	2015-12-05 22:16:23 -06:00
Himanshu Gupta	61aaa09012	support multiple intervals in dataSource input spec	2015-12-03 21:28:04 -06:00
Fangjin Yang	21c84b5ff7	Merge pull request #1896 from gianm/allocate-segment SegmentAllocateAction (fixes #1515)	2015-11-18 21:05:46 -08:00
Gian Merlino	e4e5f0375b	SegmentAllocateAction (fixes #1515 ) This is a feature meant to allow realtime tasks to work without being told upfront what shardSpec they should use (so we can potentially publish a variable number of segments per interval). The idea is that there is a "pendingSegments" table in the metadata store that tracks allocated segments. Each one has a segment id (the same segment id we know and love) and is also part of a sequence. The sequences are an idea from @cheddar that offers a way of doing replication. If there are N tasks reading exactly the same data with exactly the same logic (think Kafka tasks reading a fixed range of offsets) then you can place them in the same sequence, and they will generate the same sequence of segments.	2015-11-11 16:54:35 -08:00
Xavier Léauté	fa6142e217	cleanup and remove unused imports	2015-11-11 12:25:21 -08:00
Charles Allen	abae47850a	Add backwards compatability for PR #1922	2015-11-11 10:27:00 -08:00
Gian Merlino	dfbd0e2b60	Merge pull request #1925 from gianm/fix-index-generator Fix reference to INDEX_MAKER in IndexGeneratorJob.	2015-11-06 09:56:30 -08:00
Gian Merlino	75122dc396	Fix reference to INDEX_MAKER in IndexGeneratorJob.	2015-11-06 09:19:58 -08:00
Himanshu Gupta	6bed633121	do not use LoggingProcessIndicator in IndexGeneratorJob because that uses Stopwatch methods from guava not available in older guava versions, this makes the behavior same as LegacyIndexGeneratorJob	2015-11-06 00:40:51 -06:00
Charles Allen	929b981710	Change DefaultObjectMapper to NOT overwrite final fields unless explicitly asked to	2015-11-05 18:10:13 -08:00
Xavier Léauté	223d1ebe9f	fix a very old todo	2015-11-05 13:00:30 -08:00
fjy	8f231fd3e3	cleanup druid codebase	2015-11-04 13:59:53 -08:00
Himanshu Gupta	84f7d8d264	making static final variables in HadoopDruidIndexerConfig upper case	2015-11-02 23:24:26 -06:00
Himanshu Gupta	8b67417ac8	make methods in Index[Merger,Maker,IO] non-static so that they can have appropriate ObjectMapper injected instead of creating one statically	2015-11-02 23:24:26 -06:00
Nishant	3641a0e553	Fix Race in jar upload during hadoop indexing - https://github.com/druid-io/druid/issues/582 few fixes delete intermediate file early better exception handling use static pattern instead of compiling it every time Add retry for transient exceptions remove usage of deprecated method. Add test fix imports fix javadoc review comment. review comment: handle crazy snapshot naming review comments remove default retry count in favour of already present constant review comment make random intermediate and final paths. review comment, use temporaryFolder where possible	2015-10-22 21:41:07 +05:30
Himanshu Gupta	0368260018	For dataSource inputSpec in hadoop batch ingestion, use configured query granularity for reading existing segments instead of NONE	2015-10-12 22:19:44 -05:00
Gian Merlino	3aba401ee0	SQLMetadataConnector: Retry table creation, in case something goes wrong. Also rejigger table creation methods to not take a DBI. It's already available inside the connector, and everyone was just using that one anyway.	2015-09-24 21:39:36 -07:00
Himanshu Gupta	e8b9ee85a7	HadoopyStringInputRowParser to convert stringy Text, BytesWritable etc into InputRow	2015-09-16 10:58:13 -05:00
Himanshu Gupta	74f4572bd4	Lazily deserialize "parser" to InputRowParser in DataSchema so that user hadoop related InputRowParsers are created only when needed this allows overlord to accept a HadoopIndexTask with a hadoopy InputRowParser and not fail because hadoopy InputRowParser might need hadoop libraries	2015-09-16 10:58:13 -05:00
Himanshu Gupta	9ca6106128	user specified hadoop settings are ignored if explicitly set in code	2015-08-31 10:50:18 -05:00
Gian Merlino	940e1aa3eb	Replace funky imports with standard ones. 1) Lots of Guava imports were not coming from the actual Guava 2) junit.framework.Assert should be org.junit.Assert	2015-08-28 18:02:05 -07:00
jon-wei	e5c4927b14	Add support for parsing BytesWritable strings to Hadoop Indexer	2015-08-28 14:27:14 -07:00
Gian Merlino	414a6fb477	Fix overlapping segments in IngestSegmentFirehose, DatasourceInputFormat. Fixes #1678. IngestSegmentFirehose (and its users) need to remember which windows of which segments should actually be read, based on a timeline.	2015-08-28 07:32:41 -07:00
Himanshu Gupta	2e0dd1d792	adding UTs and addressing review comments to firehoseV2 addition to Realtime[Manager\|Plumber], essential segment metadata persist support, kafka-simple-consumer-firehose extension patch	2015-08-27 20:50:46 -05:00
lvjq	2237a8cf0f	kafka 8 simple consumer firehose	2015-08-27 20:50:46 -05:00
Charles Allen	e38cf54bc8	Migrate TestDerbyConnector to a JUnit @Rule	2015-08-26 21:47:40 -07:00
Himanshu Gupta	b3c570e78d	update BatchDeltaIngestion.testDeltaIngestion(..) to check for proper glob path handling	2015-08-20 21:36:34 -05:00
Himanshu Gupta	85e3ce9096	split hadoop glob path before adding it to MultipleInputs This can be safely reverted once https://issues.apache.org/jira/browse/MAPREDUCE-5061 is fixed	2015-08-20 21:36:34 -05:00
Himanshu Gupta	a603bd9547	HadoopGlobPathSplitter implementation to split hadoop glob paths This can be safely reverted once https://issues.apache.org/jira/browse/MAPREDUCE-5061 is fixed	2015-08-20 21:36:34 -05:00
Himanshu Gupta	cf3ec8eb46	helpful cause explaining why SegmentDescriptorInfo did not exist	2015-08-19 10:29:04 -05:00
Himanshu Gupta	a3bab5b7d9	IndexGeneratorJobTest type unit test for batch delta ingestion and reindexing	2015-08-16 14:07:35 -05:00
Himanshu Gupta	15fa43dd43	changing DatasourcePathSpec, to get segment list, so that hadoop indexer uses overlord action to get list of segments and passes when running as an overlord task. and, uses metadata store directly when running as standalone hadoop indexer also, serialized list of segments is passed to DatasourcePathSpec so that hadoop classloader issues do not creep up	2015-08-16 14:07:35 -05:00
Himanshu Gupta	45947a1021	add ability to specify Multiple PathSpecs in batch ingestion, so that we can grab data from multiple places in same ingestion Conflicts: indexing-hadoop/src/main/java/io/druid/indexer/HadoopDruidIndexerConfig.java indexing-hadoop/src/main/java/io/druid/indexer/JobHelper.java Conflicts: indexing-hadoop/src/main/java/io/druid/indexer/path/PathSpec.java	2015-08-16 13:15:38 -05:00
Himanshu Gupta	1ae56f139b	Druid Hadoop InputFormat and pathSpec Conflicts: indexing-hadoop/src/main/java/io/druid/indexer/path/PathSpec.java indexing-service/pom.xml	2015-08-16 13:15:38 -05:00
Himanshu Gupta	f1d309a671	do not run parser if value from InputFormat is already an InputRow	2015-08-14 14:44:22 -05:00
Himanshu Gupta	0eec1bbee2	json serde tests for HadoopTuningConfig	2015-07-20 12:01:53 -05:00
Himanshu Gupta	f836c3a7ac	adding flag useCombiner to hadoop tuning config that can be used to add a hadoop combiner to hadoop batch ingestion to do merges on the mappers if possible	2015-07-20 12:01:53 -05:00
Himanshu Gupta	4ef484048a	take control of InputRow serde between Mapper/Reducer in Hadoop Indexing This allows for arbitrary InputFormat while hadoop batch ingestion that can return records of value type other than Text	2015-07-20 12:01:53 -05:00
Himanshu Gupta	f7a92db332	generic byte[] serde for InputRow	2015-07-20 12:01:53 -05:00
Charles Allen	b2bc46be17	Merge pull request #1484 from tubemogul/feature/1463 JobHelper.ensurePaths will set job properties from config (tuningConf…	2015-07-07 10:58:16 -07:00
Michael Schiff	6ad451a44a	JobHelper.ensurePaths will set job properties from config (tuningConfig.jobProperties) before adding input paths to the config. Adding input paths will create Path and FileSystem instances which may depend on the values in the job config. This allows all properties to be set from the spec file, avoiding having to directly edit cluster xml files. IndexGeneratorJob.run adds job properties before adding input paths (adding input paths may depend on having job properies set) JobHelperTest confirms that JobHelper.ensurePaths adds job properties javadoc for addInputPaths to explain relationship with addJobProperties	2015-07-01 12:45:32 -07:00
Davide Anastasia	4a3a7dd1ad	read hadoop-indexer configuration file from HDFS	2015-06-24 14:08:53 -07:00
Hao Xia	1931491c9f	A couple of hdfs related fixes * Class loading issue with hdfs-storage extension * Exception when using hdfs with non-fully qualified segment path	2015-06-19 17:22:20 -07:00
Charles Allen	94a567732a	Wipe FileContext off the face of the earth * Fixes https://github.com/druid-io/druid/issues/1433 * Works arround https://issues.apache.org/jira/browse/HADOOP-10643 * Reverts to the prior method of renaming	2015-06-16 09:48:09 -07:00
Charles Allen	6230ac90ae	Use IndexMerger for conversion	2015-06-10 11:34:58 -07:00
Charles Allen	056cab93ed	Add Hadoop Converter Job and task * Fixes https://github.com/druid-io/druid/issues/1363 * Add extra utils in JobHelper based on PR feedback	2015-06-09 14:47:38 -07:00
Charles Allen	2a76bdc60a	Abstractify hadoopy indexer configuration. * Moves many items to JobHelper * Remove dependencies of these functions on HadoopDruidIndexerConfig in favor of more general items * Changes functionalities of some of the path methods to always return a path with scheme * Adds retry to uploads * Change output loadSpec determining from using outputFS.getClass().getName() to using outputFS.getScheme()	2015-06-08 10:53:27 -07:00
fjy	be2a35188e	Additional schema validations and better logs for common extensions	2015-05-27 16:25:02 -07:00
Xavier Léauté	4466e77b25	Merge pull request #1371 from guobingkun/unit_test Unit test for IndexGeneratorJob	2015-05-22 10:34:24 -04:00
flow	07659f30ab	bug fix: hdfs task log and indexing task not work properly with Hadoop HA	2015-05-21 20:49:42 +08:00
Bingkun Guo	b46aff12ae	Unit test for IndexGeneratorJob	2015-05-18 12:31:16 -05:00
Fangjin Yang	a2dc58cd2d	Merge pull request #1345 from pjain1/unit_test_warn_fix fix warn msg and some unit tests	2015-05-08 08:06:20 -07:00
Parag Jain	01448d264c	Fix warn msg and added some unit tests	2015-05-07 17:10:05 -05:00
fjy	b19435d172	fix typos with batch ingestion in docs	2015-05-07 14:46:17 -07:00
Bingkun Guo	1ee550dd91	Fix a potential issue in DeterminePartitionsJob by making HadoopDruidIndexerConfig non-static, and two unit tests for DeterminPartitionsJob and LocalDataSegmentKiller	2015-05-04 20:00:29 -07:00
Xavier Léauté	3a3046ccf3	add support for dimension compression - compression for single-value dimensions using CompressedVSizeIntsIndexedSupplier - makes dimension compression configurable via IndexSpec - IndexSpec also enables configuring bitmap and metric compression	2015-04-14 10:44:18 -07:00
Prajwal Tuladhar	3044bf5592	use Job.getInstance() to fix deprecated warnings	2015-04-09 13:22:21 -04:00
Xavier Léauté	8b5fa8f85d	always upload SNAPSHOT self-contained jars	2015-04-03 21:18:09 -07:00
Dia Kharrat	3a6dc99384	log invalid rows in mapper of Hadoop indexer	2015-03-19 22:31:04 -07:00
Dia Kharrat	58d5f5e7f0	Honor ignoreInvalidRows in Hadoop indexer The reducer of the hadoop indexer now ignores lines with parsing exceptions (if enabled by the indexer config).	2015-03-19 22:31:04 -07:00
Himanshu Gupta	8c1f0834ba	Removing MapWritableInputRowParser from indexing-hadoop it should really be an extension if user needs	2015-03-19 18:37:08 -05:00
Himanshu Gupta	3f7a7ba5d3	For batch hadoop indexing, make hadoop input format configuration. Given input format must extend from org.apache.hadoop.mapreduce.InputFormat	2015-03-18 16:09:45 -05:00
fjy	bfe10bd156	This fixes arbitrary gran spec breaking	2015-03-17 12:19:43 -07:00
Himanshu Gupta	6a0405de20	fail early if there is no input data for batch hadoop indexing	2015-03-07 12:45:57 -06:00
Himanshu Gupta	30f64ff19e	UTs update for indexing-hadoop	2015-02-25 15:45:57 -08:00
Xavier Léauté	0784d7e30e	Merge pull request #1152 from himanshug/metastorage-pwd-provider support for metadata store PasswordProvider interface	2015-02-25 15:19:37 -08:00
Fangjin Yang	708f35151d	Merge pull request #1121 from gianm/issue-1116 Use the proper FileSystems for writing segments and caching jars. (for issue #1116)	2015-02-25 13:03:59 -08:00
Fangjin Yang	6424815f88	Merge pull request #1097 from metamx/better-hadoop-sort-key Sort HadoopIndexer rows by time+dim bucket to help reduce spilling	2015-02-25 12:49:58 -08:00
Himanshu Gupta	126262edce	support for PasswordProvider interface to enable writing druid extension which can get metadata store password from secured location or anywhere instead of plain text properties file	2015-02-25 14:05:19 -06:00
Himanshu Gupta	01a4f19ea2	removing dependency on NativeS3FileSystem and other file systems	2015-02-23 14:27:50 -06:00
Gian Merlino	fd5a7d1f08	Use the proper FileSystems for writing segments and caching jars. (for issue #1116 )	2015-02-12 16:20:10 -08:00
Xavier Léauté	b1ec7afc12	Sort HadoopIndexer rows by time+dim bucket to help reduce spilling	2015-02-10 14:26:28 -08:00
Fangjin Yang	92e616de11	Merge pull request #1077 from metamx/remove-unused-imports remove unused imports	2015-02-02 10:45:27 -08:00
nishantmonu51	ba932bb1f2	remove unused imports	2015-02-02 21:53:39 +05:30
fjy	d05032b98a	towards a community led druid	2015-01-31 20:57:36 -08:00
Xavier Léauté	cd9635ff5e	Merge pull request #1034 from druid-io/minor-rename minor rename of things in hadoop ingestion config to match 0.6.x	2015-01-15 15:46:13 -08:00
fjy	ccddbf8747	minor rename of things in hadoop ingestion config to match 0.6.x	2015-01-15 14:04:55 -08:00
Fangjin Yang	5bfcc43377	Merge pull request #1008 from metamx/stringConversionJavaUtilUpdate Update all String conversions to and from byte[] to use the java-util StringUtils functions	2015-01-15 13:50:27 -08:00
Fangjin Yang	852e863425	Merge pull request #981 from druid-io/strictModuleTyping Use Module instead of generic Object in Guice related items	2015-01-05 12:43:20 -08:00
Charles Allen	b1b5c9099e	Update all String conversions to and from byte[] to use the java-util StringUtils functions * Speedup of GroupBy with javaScript filters by ~10% * Requires https://github.com/metamx/java-util/pull/15	2015-01-05 11:22:32 -08:00
Xavier Léauté	f1375b0bfb	workaround to pass down bitmap type to map-reduce tasks	2015-01-02 17:29:00 -08:00
Charles Allen	7c8d4a7433	Use Module instead of generic Object in Guice related items	2014-12-19 10:54:06 -08:00
fjy	43d27ddaf0	update http client and fix logging	2014-12-15 16:59:57 -08:00
fjy	e872952390	fix working path default bug	2014-12-15 14:51:58 -08:00

... 2 3 4 5 6 ...

507 Commits