druid

Commit Graph

Author	SHA1	Message	Date
Goh Wei Xiang	f68a0693f3	Allow use of non-threadsafe ObjectCachingColumnSelectorFactory (#4397 ) * Adding a flag to indicate when ObjectCachingColumnSelectorFactory need not be threadsafe. * - Use of computeIfAbsent over putIfAbsent - Replace Maps.newXXXMap() with normal instantiation - Documentations on when is thread-safe required. - Use Builders for On/OffheapIncrementalIndex * - Optimization on computeIfAbsent - Constant EMPTY DimensionsSpec - Improvement on IncrementalIndexSchema.Builder - Remove setting of default values - Use var args for metrics - Correction on On/OffheapIncrementalIndex Builders - Combine On/OffheapIncrementalIndex Builders * - Removing unused imports. * - Helper method for testing with IncrementalIndex.Builder * - Correction on javadoc. * Style fix	2017-06-16 16:04:19 -05:00
Gian Merlino	1f2afccdf8	Expressions: Add ExprMacros. (#4365 ) * Expressions: Add ExprMacros, which have the same syntax as functions, but can convert themselves to any kind of Expr at parse-time. ExprMacroTable is an extension point for adding new ExprMacros. Anything that might need to parse expressions needs an ExprMacroTable, which can be injected through Guice. * Address code review comments.	2017-06-08 09:32:10 -04:00
Roman Leventov	63a897c278	Enable most IntelliJ 'Probable bugs' inspections (#4353 ) * Enable most IntelliJ 'Probable bugs' inspections * Fix in RemoteTestNG * Fix IndexSpec's equals() and hashCode() to include longEncoding * Fix inspection errors * Extract global isntance of natural().nullsFirst(); address comments * Fix * Use noinspection comments instead of SuppressWarnings on method for IntelliJ-specific inspections * Prohibit Ordering.natural().nullsFirst() using Checkstyle	2017-06-07 09:54:25 -07:00
Himanshu	4ace65a2af	fix NPE in IndexGeneratorJob (#4371 ) * fix NPE in IndexGeneratorJob * address review comment * review comments	2017-06-07 05:54:03 -07:00
Roman Leventov	31d33b333e	Make using implicit system Charset an error (#4326 ) * Make using implicit system charset an error * Use StringUtils.toUtf8() and fromUtf8() instead of String.getBytes() and new String() * Use English locale in StringUtils.safeFormat() * Restore comment	2017-06-05 23:57:25 -07:00
Slim	a2584d214a	Delagate creation of segmentPath/LoadSpec to DataSegmentPushers and add S3a support (#4116 ) * Adding s3a schema and s3a implem to hdfs storage module. * use 2.7.3 * use segment pusher to make loadspec * move getStorageDir and makeLoad spec under DataSegmentPusher * fix uts * fix comment part1 * move to hadoop 2.8 * inject deep storage properties * set version to 2.7.3 * fix build issue about static class * fix comments * fix default hadoop default coordinate * fix create filesytem * downgrade aws sdk * bump the version	2017-06-04 00:55:09 -06:00
Goh Wei Xiang	b77fab8a30	Replace usages of CountingMap with Object2LongMap (#4320 ) * Replaces use of CountingMap with Object2LongMap from fastutil. * Remove CountingMap classes and minor fixes * Added additional test cases for DatasourceInputFormat. * Added additional test cases for CoordinatorStats. * Not materializing segment list. * Put in this fix because it is failing the test on its expected behavior. * Added missing header.	2017-05-24 17:40:32 -07:00
Roman Leventov	b7a52286e8	Make @Override annotation obligatory (#4274 ) * Make MissingOverride an error * Make travis stript to fail fast * Add missing Override annotations * Comment	2017-05-16 13:30:30 -05:00
Benedict Jin	e823085866	Improve `collection` related things that reusing a immutable object instead of creating a new object (#4135 )	2017-05-17 01:38:51 +09:00
Jihoon Son	50a4ec2b0b	Add support for headers and skipping thereof for CSV and TSV (#4254 ) * initial commit * small fixes * fix bug * fix bug * address code review * more cr * more cr * more cr * fix * Skip head rows for CSV and TSV * Move checking skipHeadRows to FileIteratingFirehose * Remove checking null iterators * Remove unused imports * Address comments * Fix compilation error * Address comments * Add more tests * Add a comment to ReplayableFirehose * Addressing comments * Add docs and fix typos	2017-05-15 22:57:31 -07:00
Roman Leventov	1ebfa22955	Update Error prone configuration; Fix bugs (#4252 ) * Make Errorprone the default compiler * Address comments * Make Error Prone's ClassCanBeStatic rule a error * Preconditions allow only %s pattern * Fix DruidCoordinatorBalancerTester * Try to give the compiler more memory * Remove distribution module activation on jdk 1.8 because only jdk 1.8 is used now * Don't show compiler warnings * Try different travis script * Fix travis.yml * Make Error Prone optional again * For error-prone compiler * Increase compiler's maxmem * Don't run Error Prone for benchmarks because of OOM * Skip install step in Travis * Remove MetricHolder.writeToChannel() * In travis.yml, check compilation before tests, because it may fail faster	2017-05-12 15:55:17 +09:00
Pierre	bba31e0c8b	close aggregators in indexing-hadoop mappers (#4251 )	2017-05-05 08:29:13 -07:00
Pierre	e9872f0695	do not flush on closed stream (#4250 )	2017-05-05 09:19:20 +09:00
Roman Leventov	8277284d67	Add Checkstyle rule to force comments to classes and methods to be Javadoc comments (#4239 )	2017-05-04 11:14:41 -07:00
Gian Merlino	97ddb38d75	DatasourceInputSplit: Serialize with write instead of writeUTF. (#4195 ) writeUTF has a limit of 64KB, making it difficult to write out splits that read a large number of descriptors for small segments.	2017-04-25 10:26:44 -07:00
Gian Merlino	2ca7b00346	Update versions to 0.10.1-SNAPSHOT. (#4191 )	2017-04-20 18:12:28 -07:00
Gian Merlino	b4289c0004	Remove "granularity" from IngestSegmentFirehose. (#4110 ) It wasn't doing anything useful (the sequences were being concatted, and cursor.getTime() wasn't being called) and it defaulted to Granularities.NONE. Changing it to Granularities.ALL gave me a 700x+ performance boost on a small dataset I was reindexing (2m27s to 365ms). Most of that was from avoiding making a lot of unnecessary column selectors.	2017-03-24 10:28:54 -07:00
Roman Leventov	81a5f9851f	TmpFileIOPeons to create files under the merging output directory, instead of java.io.tmpdir (#3990 ) * In IndexMerger and IndexMergerV9, create temporary files under the output directory/tmpPeonFiles, instead of java.io.tmpdir * Use FileUtils.forceMkdir() across the codebase and remove some unused code * Fix test * Fix PullDependencies.run() * Unused import	2017-03-02 14:05:12 -08:00
Akash Dwivedi	94da5e80f9	Namespace optimization for hdfs data segments. (#3877 ) * NN optimization for hdfs data segments. * HdfsDataSegmentKiller, HdfsDataSegment finder changes to use new storage format.Docs update. * Common utility function in DataSegmentPusherUtil. * new static method `makeSegmentOutputPathUptoVersionForHdfs` in JobHelper * reuse getHdfsStorageDirUptoVersion in DataSegmentPusherUtil.getHdfsStorageDir() * Addressed comments. * Review comments. * HdfsDataSegmentKiller requested changes. * extra newline * Add maprfs.	2017-03-01 09:51:20 -08:00
praveev	5ccfdcc48b	Fix testDeadlock timeout delay (#3979 ) * No more singleton. Reduce iterations * Granularities * Fix the delay in the test * Add license header * Remove unused imports * Lot more unused imports from all the rearranging * CR feedback * Move javadoc to constructor	2017-02-28 12:51:41 -06:00
praveev	c3bf40108d	One granularity (#3850 ) * Refactor Segment Granularity * Beginning of one granularity * Copy the fix for custom periods in segment-grunalrity over here. * Remove the custom serialization for now. * Compilation cleanup * Reformat code * Fixing unit tests * Unify to use a single iterable * Backward compatibility for rolling upgrade * Minor check style. Cosmetic changes. * Rename length and millis to duration * CR feedback * Minor changes.	2017-02-25 01:02:29 -06:00
Akash Dwivedi	797488a677	Removing Integer.MAX column size limit. (#3743 ) * Removing Integer.MAX column size limit. * On demand creation of headerLong, use v2 instead of v3 * Avoid reusing the same object from a previous test. * Avoid reusing the same object from a previous test part#2 * code formatting. * GenericIndexed/Writer code review changes. * GenericIndexed/writer code review requested changes. * checkIndex() to static * native endianess for genericIndexedV2, code review requested changes. * Formatting * Hll fix. * use native endianess during bag size calculation. * Code review requested changes. * IOPeon close() changes. * use different tmp directory path for testing. * Code review requested changes.	2017-02-16 20:09:43 -06:00
Akash Dwivedi	8854ce018e	File.deleteOnExit() (#3923 ) * Less use of File.deleteOnExit() * removed deleteOnExit from most of the tests/benchmarks/iopeon * Made IOpeon closable * Formatting. * Revert DeterminePartitionsJobTest, remove cleanup method from IOPeon	2017-02-13 15:12:14 -08:00
baruchz	b7a88706f3	Add maprfs scheme (#3920 ) Add maprfs scheme to JobHelper to treated as HDFS deep storage	2017-02-12 18:37:58 -08:00
Gian Merlino	12317fd001	Bump version to 0.10.0-SNAPSHOT. (#3913 )	2017-02-06 17:54:35 -08:00
DaimonPl	93b71e265e	Extract HLL related code to separate module (#3900 )	2017-02-03 09:45:11 -08:00
Gian Merlino	d3a3b7ba0c	Add virtual column types, holder serde, and safety features. (#3823 ) * Add virtual column types, holder serde, and safety features. Virtual columns: - add long, float, dimension selectors - put cache IDs in VirtualColumnCacheHelper - adjust serde so VirtualColumns can be the holder object for Jackson - add fail-fast validation for cycle detection and duplicates - add expression virtual column in core Storage adapters: - move virtual column hooks before checking base columns, to prevent surprises when a new base column is added that happens to have the same name as a virtual column. * Fix ExtractionDimensionSpecs with virtual dimensions. * Fix unused imports. * CR comments * Merge one more time, with feeling.	2017-01-26 18:15:51 -08:00
Jihoon Son	d80bec83cc	Enable auto license checking (#3836 ) * Enable license checking * Clean duplicated license headers	2017-01-10 18:13:47 -08:00
Himanshu	7ced0e8759	log sizes of created smoosh files (#3817 ) * log when merging of intermediate segments starts during batch ingestion * log sizes of created smoosh files	2017-01-04 16:52:22 -08:00
Gian Merlino	d8702ebece	Filters: Use ColumnSelectorFactory directly for building row-based matchers. (#3797 ) * Filters: Use ColumnSelectorFactory directly for building row-based matchers. * Adjustments based on code review. - BoundDimFilter: fewer volatiles, rename matchesAnything to !matchesNothing. - HavingSpecs: Clarify that they are not thread-safe, and make DimFilterHavingSpec not thread safe. - Renamed rowType to rowSignature. - Added specializations for time-based vs non-time-based DimensionSelector in RBCSF. - Added convenience method DimensionHanderUtils.createColumnSelectorPlus. - Added singleton ZeroIndexedInts. - Added test cases for DimFilterHavingSpec. * Make ValueMatcherColumnSelectorStrategy actually use the associated selector. * Add RangeIndexedInts. * DimFilterHavingSpec: Fix concurrent usage guard on jdk7. * Add assertion to ZeroIndexedInts. * Rename no-longer-volatile members.	2017-01-03 14:30:22 -08:00
Erik Dubbelboer	c0c34f82ad	Fix reindexing of segments in Google Cloud Storage (#3788 ) Google Cloud Storage allows `:` in paths. For this reason `google` was not added to `da007ca3c2/indexing-hadoop/src/main/java/io/druid/indexer/JobHelper.java (L585)` Normally this is not an issue but when reindexing segments the Hadoop code for `getSplits` trips up on the `:` and returns: `Relative path in absolute URI` This patch URL encodes the `:` character making it work in Hadoop as well.	2016-12-20 17:16:33 -08:00
Jonathan Wei	880a021a7a	Fix missed travis failures from PR 3567 and 2798 (#3761 ) * Fix checkstyle failures from PR 3567 * Fix GranularityPathSpecTest compile failure	2016-12-07 19:07:31 -08:00
Navis Ryu	f794246ec1	Trimming out outside of given interval (#2798 ) * Trimming out outside of given interval (Fix for #2659) * addressed comments	2016-12-07 18:05:50 -08:00
Roman Leventov	c070b4a816	Fix concurrency defects, remove unnecessary volatiles (#3701 )	2016-11-22 16:42:28 -08:00
Erik Dubbelboer	7d36f540e8	WIP: Add Google Storage support (#2458 ) Also excludes the correct artifacts from #2741	2016-11-16 14:06:45 +05:30
Gian Merlino	bcd20441be	Make buildV9Directly the default. (#3688 )	2016-11-14 09:29:32 -08:00
praveev	52a74cf84f	Use timestamp in millis as Map key instead of DateTime object (#3674 ) * Use Long timestamp as key instead of DateTime. DateTime representation is screwed up when you store with an obj and read with a different DateTime obj. For example: The code below fails when you use DateTime as key ``` DateTime odt = DateTime.now(DateTimeUtils.getZone(DateTimeZone.forID("America/Los_Angeles"))); HashMap<DateTime, String> map = new HashMap<>(); map.put(odt, "abc"); DateTime dt = new DateTime(odt.getMillis()); System.out.println(map.get(dt)); ``` * Respect timezone when creating the file. * Update docs with timezone caveat in granularity spec * Remove unused imports	2016-11-11 10:20:20 -08:00
Himanshu	b76b3f8d85	reset-cluster command to clean up druid state stored on metadata and deep storage (#3670 )	2016-11-09 11:07:01 -06:00
Gian Merlino	89d9c61894	Deprecate Aggregator.getName and AggregatorFactory.getAggregatorStartValue. (#3572 )	2016-10-31 15:24:30 -07:00
Akash Dwivedi	6a845e1f7b	Adding getDelegate() to directly access delegate. (#3616 ) 👍	2016-10-27 15:57:36 -07:00
Akash Dwivedi	4b3bd8bd63	Migrating java-util from Metamarkets. (#3585 ) * Migrating java-util from Metamarkets. * checkstyle and updated license on java-util files. * Removed unused imports from whole project. * cherry pick metamx/java-util@826021f. * Copyright changes on java-util pom, address review comments.	2016-10-21 14:57:07 -07:00
Gian Merlino	dd0bb6da1e	Unit test for #3544 : Avoid exceptions for dataSource spec when using s3. (#3571 )	2016-10-17 12:41:43 -07:00
Navis Ryu	4554c1214b	Avoid exceptions for dataSource spec when using s3 (#3544 )	2016-10-14 18:24:19 -07:00
Akash Dwivedi	078de4fcf9	Use explicit version from HadoopIngestionSpec. (#3554 )	2016-10-07 13:59:14 -07:00
praveev	43cdc675c7	Add support for timezone in segment granularity (#3528 ) * Add support for timezone in segment granularity * CR feedback. Handle null timezone during equals check. * Include timezone in docs. Add timezone for ArbitraryGranularitySpec.	2016-10-03 08:15:42 -07:00
Gian Merlino	40f2fe7893	Bump versions to 0.9.3-SNAPSHOT (#3524 )	2016-09-29 13:53:32 -07:00
Fokko Driesprong	67920c114e	Fixed info message (#3481 )	2016-09-21 15:50:29 -07:00
Gian Merlino	27bd5cb13a	Add forceExtendableShardSpecs option to Hadoop indexing, IndexTask. (#3473 ) Fixes #3241.	2016-09-21 13:40:04 -06:00
Slim	ba6ddf307e	Adding hadoop kerberos authentification. (#3419 ) * adding kerberos authentication * make the 2 functions identical	2016-09-13 10:42:50 -07:00
Jonathan Wei	df766b2bbd	Add dimension handling interface for ingestion and segment creation (#3217 ) * Add dimension handling interface for ingestion and segment creation * update javadocs for DimensionHandler/DimensionIndexer * Move IndexIO row validation into DimensionHandler * Fix null column skipping in mergerV9 * Add deprecation note for 'numeric_dims' filename pattern in IndexIO v8->v9 conversion * Fix java7 test failure	2016-09-12 12:54:02 -07:00

1 2 3 4 5 ...

848 Commits