druid

Commit Graph

Author	SHA1	Message	Date
Jihoon Son	733dfc9b30	Add PrefetchableTextFilesFirehoseFactory for cloud storage types (#4193 ) * Add PrefetcheableTextFilesFirehoseFactory * fix comment * exception handling * Fix wrong json property * Remove ReplayableFirehoseFactory and fix misspelling * Defer object initialization * Add a temporaryDirectory parameter to FirehoseFactory.connect() * fix when cache and fetch are disabled * Address comments * Add more test * Increase timeout for test * Add wrapObjectStream * Move methods to Firehose from PrefetchableFirehoseFactory * Cleanup comment * add directory listing to s3 firehose * Rename a variable * Addressing comments * Update document * Support disabling prefetch * Fix race condition * Add fetchLock * Remove ReplayableFirehoseFactoryTest * Fix compilation error * Fix test failure * Address comments * Add default implementation for new method	2017-05-18 15:37:18 +09:00
Himanshu	daa8ef8658	Optional long-polling based segment announcement via HTTP instead of Zookeeper (#3902 ) * Optional long-polling based segment announcement via HTTP instead of Zookeeper * address review comments * make endpoint /druid-internal/v1 instead of /druid/internal so that jetty qos filters can be configured easily when needed * update segment callback initialization to be called only after first segment list fetch has been succeeded from all servers * address review comments * remove size check not required anymore as only segment servers announce themselves and not all peon processes * annouce segment server on historical only after cached segments are loaded * fix checkstyle errors	2017-05-17 16:31:58 -05:00
Roman Leventov	d400f23791	Monomorphic processing of TopN queries with simple double aggregators over historical segments (part of #3798 ) (#4079 ) * Monomorphic processing of topN queries with simple double aggregators and historical segments * Add CalledFromHotLoop annocations to specialized methods in SimpleDoubleBufferAggregator * Fix a bug in Historical1SimpleDoubleAggPooledTopNScannerPrototype * Fix a bug in SpecializationService * In SpecializationService, emit maxSpecializations warning only once * Make GenericIndexed.theBuffer final * Address comments * Newline * Reapply `439c906` (Make GenericIndexed.theBuffer final) * Remove extra PooledTopNAlgorithm.capabilities field * Improve CachingIndexed.inspectRuntimeShape() * Fix CompressedVSizeIntsIndexedSupplier.inspectRuntimeShape() * Don't override inspectRuntimeShape() in subclasses of CompressedVSizeIndexedInts * Annotate methods in specializations of DimensionSelector and FloatColumnSelector with @CalledFromHotLoop * Make ValueMatcher to implement HotLoopCallee * Doc fix * Fix inspectRuntimeShape() impl in ExpressionSelectors * INFO logging of specialization events * Remove modificator * Fix OrFilter * Fix AndFilter * Refactor PooledTopNAlgorithm.scanAndAggregate() * Small refactoring * Add 'nothing to inspect' messages in empty HotLoopCallee.inspectRuntimeShape() implementations * Don't care about runtime shape in tests * Fix accessor bugs in Historical1SimpleDoubleAggPooledTopNScannerPrototype and HistoricalSingleValueDimSelector1SimpleDoubleAggPooledTopNScannerPrototype, cover them with tests * Doc wording * Address comments * Remove MagicAccessorBridge and ensure Offset subclasses are public * Attach error message to element	2017-05-16 16:19:55 -07:00
Roman Leventov	b7a52286e8	Make @Override annotation obligatory (#4274 ) * Make MissingOverride an error * Make travis stript to fail fast * Add missing Override annotations * Comment	2017-05-16 13:30:30 -05:00
David Lim	8333043b7b	add skipOffsetGaps flag (#4256 )	2017-05-16 12:19:28 -06:00
Benedict Jin	e823085866	Improve `collection` related things that reusing a immutable object instead of creating a new object (#4135 )	2017-05-17 01:38:51 +09:00
Jihoon Son	50a4ec2b0b	Add support for headers and skipping thereof for CSV and TSV (#4254 ) * initial commit * small fixes * fix bug * fix bug * address code review * more cr * more cr * more cr * fix * Skip head rows for CSV and TSV * Move checking skipHeadRows to FileIteratingFirehose * Remove checking null iterators * Remove unused imports * Address comments * Fix compilation error * Address comments * Add more tests * Add a comment to ReplayableFirehose * Addressing comments * Add docs and fix typos	2017-05-15 22:57:31 -07:00
Fokko Driesprong	5ca67644e7	Remove slf4j as dependencies (#4233 ) From the kafka-schema-registry-client in the avro extension slf4j will be packaged into the distribution. We don't want this as it will conflict and throw a slf4j multiple bindings warning. This will cause slf4j to fall back to no-operation (NOP) binding.	2017-05-12 15:59:14 +09:00
Roman Leventov	1ebfa22955	Update Error prone configuration; Fix bugs (#4252 ) * Make Errorprone the default compiler * Address comments * Make Error Prone's ClassCanBeStatic rule a error * Preconditions allow only %s pattern * Fix DruidCoordinatorBalancerTester * Try to give the compiler more memory * Remove distribution module activation on jdk 1.8 because only jdk 1.8 is used now * Don't show compiler warnings * Try different travis script * Fix travis.yml * Make Error Prone optional again * For error-prone compiler * Increase compiler's maxmem * Don't run Error Prone for benchmarks because of OOM * Skip install step in Travis * Remove MetricHolder.writeToChannel() * In travis.yml, check compilation before tests, because it may fail faster	2017-05-12 15:55:17 +09:00
Roman Leventov	e09e892477	Refactor QueryRunner to accept QueryPlus: Query + QueryMetrics (part of #3798 ) (#4184 ) * Add QueryPlus. Add QueryRunner.run(QueryPlus, Map) method with default implementation, to replace QueryRunner.run(Query, Map). * Fix GroupByMergingQueryRunnerV2 * Fix QueryResourceTest * Expand the comment to Query.run(walker, context) * Remove legacy version of BySegmentSkippingQueryRunner.doRun() * Add LegacyApiQueryRunnerTest and be more specific about legacy API removal plans in Druid 0.11 in Javadocs	2017-05-10 12:25:00 -07:00
Parag Jain	1fd177039d	fix auto reset - pause task instead of putting thread to sleep (#4244 )	2017-05-08 15:08:25 -07:00
Parag Jain	eb8e1b0a97	Prevent interrupted exception from polluting log during supervisor shutdown (#4253 ) * Prevent interrupted exception from polluting log during supervisor shutdown * do nothing in case of InterruptedException	2017-05-08 15:05:25 -07:00
Parag Jain	4502c207af	fix injection bug and documentation (#4243 )	2017-05-03 15:07:43 -05:00
Parag Jain	f9a61ea2ba	Kafka lag emitter - Kafka Indexing Service (#4194 ) * Kafka lag emitter * enforce minimum emit period to a minute * fixed comment	2017-05-02 17:30:07 -06:00
Roman Leventov	0bc18e7906	Make UpdateCounter proof to update count overflow (#4138 ) * Make UpdateCounter proof to update count overflow. * Fix	2017-05-01 09:59:49 -07:00
Bas van Schaik	54463941b9	Fix two alerts from lgtm.com: comparing two boxed primitive values using (#4212 ) the == or != operator compares object identity, which may not be intended Details: `013566ade9/files/extensions-core/datasketches/src/main/java/io/druid/query/aggregation/datasketches/theta/SketchEstimatePostAggregator.java (V144)` `013566ade9/files/extensions-core/datasketches/src/main/java/io/druid/query/aggregation/datasketches/theta/SketchMergeAggregatorFactory.java (V164)`	2017-04-26 14:56:25 -07:00
Akash Dwivedi	a2419654ea	Allow hadoop configurations using runtime properties. (#4189 )	2017-04-26 00:05:27 +05:30
Gian Merlino	3b92220015	Reduce log spam from Avro decoders. (#4205 ) These objects get constructed semi-frequently (any time a parser is deserialized) and so info logs are spammy. They'll still appear in task logs at least once, since they're part of the task definition and will get logged due to that.	2017-04-25 23:59:59 +05:30
Benedict Jin	de815da942	Some code refactor for better performance of `Avro-Extension` (#4092 ) * 1. Collections.singletonList instand of Arrays.asList; 2. close FSDataInputStream/ByteBufferInputStream for releasing resource; 3. convert com.google.common.base.Function into java.util.function.Function; 4. others code refactor * Put each param on its own line for code style * Revert GenericRecordAsMap back about `Function`	2017-04-25 12:46:32 +09:00
satishbhor	d51097c809	Fix lz4 library incompatibility in kafka-indexing-service extension (#4115 ) * Fix lz4 library incompatibility in kafka-indexing-service extension #3266 * Bumped Kafka version to 0.10.2.0 for : Fix lz4 library incompatibility in kafka-indexing-service extension #3266 * Replaced Lists.newArrayList() with Collections.singletonList() For Fix lz4 library incompatibility in kafka-indexing-service extension #4115	2017-04-25 12:23:51 +09:00
Gian Merlino	2ca7b00346	Update versions to 0.10.1-SNAPSHOT. (#4191 )	2017-04-20 18:12:28 -07:00
Jerry Chung	0bcfd9354c	Fix S3 deep storage push and s3 insert-segment-to-db (#4174 ) * Fix S3 deep storage push and s3 insert-segment-to-db * Less verbose checks in S3DataSegmentFinder	2017-04-14 19:42:10 -07:00
Gian Merlino	b2954d5fea	Better groupBy error messages and docs around resource limits. (#4162 ) * Better groupBy error messages and docs around resource limits. * Fix BufferGrouper test from datasketches. * Further clarify.	2017-04-13 10:38:53 -07:00
Roman Leventov	15f3a94474	Copy closer into Druid codebase (fixes #3652 ) (#4153 )	2017-04-10 09:38:45 +09:00
Parag Jain	7e0d4c9555	secure supervisor endpoints (#3985 )	2017-04-05 16:42:32 -07:00
Roman Leventov	73d9b31664	GenericIndexed minor bug fixes, optimizations and refactoring (#3951 ) * Minor bug fixes in GenericIndexed; Refactor and optimize GenericIndexed; Remove some unnecessary ByteBuffer duplications in some deserialization paths; Add ZeroCopyByteArrayOutputStream * Fixes * Move GenericIndexedWriter.writeLongValueToOutputStream() and writeIntValueToOutputStream() to SerializerUtils * Move constructors * Add GenericIndexedBenchmark * Comments * Typo * Note in Javadoc that IntermediateLongSupplierSerializer, LongColumnSerializer and LongMetricColumnSerializer are thread-unsafe * Use primitive collections in IntermediateLongSupplierSerializer instead of BiMap * Optimize TableLongEncodingWriter * Add checks to SerializerUtils methods * Don't restrict byte order in SerializerUtils.writeLongToOutputStream() and writeIntToOutputStream() * Update GenericIndexedBenchmark * SerializerUtils.writeIntToOutputStream() and writeLongToOutputStream() separate for big-endian and native-endian * Add GenericIndexedBenchmark.indexOf() * More checks in methods in SerializerUtils * Use helperBuffer.arrayOffset() * Optimizations in SerializerUtils	2017-03-27 14:17:31 -05:00
Benedict Jin	23f77ebd20	Explain Avro's unnecessary EOFException (#4098 ) (#4100 ) * Explain Avro's unnecessary EOFException (#4098) * add jira link into log message	2017-03-24 10:45:45 -05:00
Gian Merlino	4b9f975f50	Rename SketchAggregationWithSimpleDataTest. (#4105 ) Tests that don't end in "Test" won't get run automatically by Maven.	2017-03-23 14:20:50 -07:00
Akash Dwivedi	ff7f90b02d	relocate method in BufferAggregator. (#4071 ) * relocate method in BufferAggregator. * Unused import. * Detailed javadoc. * using Int2ObjectMap. * batch relocate. * Revert batch relocate. * Unused import. * code comments. * code comment.	2017-03-23 13:07:59 -07:00
Roman Leventov	84fe91ba0b	Monomorphic processing of TopN queries with 1 and 2 aggregators (key part of #3798 ) (#3889 ) * Monomorphic processing: add HotLoopCallee, CalledFromHotLoop, RuntimeShapeInspector, SpecializationService. Specialize topN queries with 1 or 2 aggregators. Add Cursor.advanceUninterruptibly() and isDoneOrInterrupted() for exception-free query processing. * Use Execs.singleThreaded() * RuntimeShapeInspector to support nullable fields * Make CalledFromHotLoop annotation Inherited * Remove unnecessary conversion of array of ColumnSelectorPluses to list and back to array in CardinalityAggregatorFactory * Close InputStream in SpecializationService * Formatting * Test specialized PooledTopNScanners * Set flags in PooledTopNAlgorithm directly * Fix tests, dependent on CountAggragatorFactory toString() form * Fix * Revert CountAggregatorFactory changes * Implement inspectRuntimeShape() for LongWrappingDimensionSelector and FloatWrappingDimensionSelector * Remove duplicate RoaringBitmap dependency in the extendedset pom.xml * Fix * Treat ByteBuffers specially in StringRuntimeShape * Doc fix * Annotate BufferAggregator.init() with CalledFromHotLoop * Make triggerSpecializationIterationsThreshold an int * Remove SpecializationService.PerPrototypeClassState.of() * Add comments * Limit the amount of specializations that SpecializationService could make * Add default implementation for BufferAggregator.inspectRuntimeShape(), for compatibility with extensions * Use more efficient ConcurrentMap's idioms in SpecializationService	2017-03-17 14:44:36 -05:00
Charles Allen	805d85afda	Allow compilation as Java8 source and target (#3328 ) * Allow compilation as Java8 source and target for everything except API * Remove conditions in tests which assume that we may run with Java 7 * Update easymock to 3.4 * Make Animal Sniffer to check Java 1.8 usage; remove redundant druid-caffeine-cache configuration * Use try-with-resources in LargeColumnSupportedComplexColumnSerializerTest.testSanity() * Remove java7 special for druid-api	2017-03-14 22:23:47 -06:00
Gian Merlino	3216134f8c	SQL: Make row extractions extensible and add one for lookups. (#3991 ) This is a reopening of #3989, since that PR was merged to master prematurely and accidentally.	2017-03-13 21:56:16 -07:00
Nishant Bangarwa	adbe89e7d6	Fix race in KafkaIndexTaskTest (#4031 ) task.pause(0) can return early before the task is actually paused. Exception for failure - java.lang.AssertionError: expected:<PAUSED> but was:<READING> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at io.druid.indexing.kafka.KafkaIndexTaskTest.testRunWithOffsetOutOfRangeEx ceptionAndPause(KafkaIndexTaskTest.java:1229) To reproduce add Thread.sleep(10000) in beginning of KafkaIndexTask.possiblypause method.	2017-03-09 07:34:46 -08:00
Gian Merlino	4ca5270e88	Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods. (#4004 ) * Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods. Includes two fixes: - groupBy v2 now ignores chunkPeriod, since it wouldn't have helped anyway (its mergeResults returns a lazy sequence) and it generates incorrect results. - Fix chunkPeriod handling for periods of irregular length, like "P1M" or "P1Y". Also includes doc and test fixes: - groupBy v1 was no longer being tested by GroupByQueryRunnerTest since #3953, now it is once again. - chunkPeriod documentation was misleading due to its checkered past. Updated it to be more accurate. * Remove unused import. * Restore buffer size.	2017-03-06 12:27:02 -06:00
Akash Dwivedi	bebf9f34c7	HdfsDataSegmentPusher bug fix (#4003 ) * Fix for HdfsDataSegmentPusher. * Add missing loadspec in actual descriptor file. Tests to check actual content of descriptor file.	2017-03-06 00:53:44 -08:00
Gian Merlino	df623ebfe3	Fix a couple bugs due to calling Period.getMillis(). (#4006 )	2017-03-05 18:44:20 +05:30
Roman Leventov	81a5f9851f	TmpFileIOPeons to create files under the merging output directory, instead of java.io.tmpdir (#3990 ) * In IndexMerger and IndexMergerV9, create temporary files under the output directory/tmpPeonFiles, instead of java.io.tmpdir * Use FileUtils.forceMkdir() across the codebase and remove some unused code * Fix test * Fix PullDependencies.run() * Unused import	2017-03-02 14:05:12 -08:00
Gian Merlino	e63eefd7ff	Revert "SQL: Make row extractions extensible and add one for lookups. (#3989 )" The PR was merged to master accidentally. This reverts commit `23927a3c96`.	2017-03-01 17:06:12 -08:00
Gian Merlino	23927a3c96	SQL: Make row extractions extensible and add one for lookups. (#3989 ) * SQL: Make row extractions extensible and add one for lookups. * Fix QuantileSqlAggregatorTest.	2017-03-01 17:03:43 -08:00
Akash Dwivedi	94da5e80f9	Namespace optimization for hdfs data segments. (#3877 ) * NN optimization for hdfs data segments. * HdfsDataSegmentKiller, HdfsDataSegment finder changes to use new storage format.Docs update. * Common utility function in DataSegmentPusherUtil. * new static method `makeSegmentOutputPathUptoVersionForHdfs` in JobHelper * reuse getHdfsStorageDirUptoVersion in DataSegmentPusherUtil.getHdfsStorageDir() * Addressed comments. * Review comments. * HdfsDataSegmentKiller requested changes. * extra newline * Add maprfs.	2017-03-01 09:51:20 -08:00
Akash Dwivedi	91344cbe57	Enable GenericIndexed V2 for built-in(druid-io managed) complex columns. (#3987 ) * Enable GenericIndexed V2 for complex columns. * SerializerBuilder to use GenericColumnSerializer.	2017-02-28 22:06:54 -08:00
praveev	5ccfdcc48b	Fix testDeadlock timeout delay (#3979 ) * No more singleton. Reduce iterations * Granularities * Fix the delay in the test * Add license header * Remove unused imports * Lot more unused imports from all the rearranging * CR feedback * Move javadoc to constructor	2017-02-28 12:51:41 -06:00
praveev	c3bf40108d	One granularity (#3850 ) * Refactor Segment Granularity * Beginning of one granularity * Copy the fix for custom periods in segment-grunalrity over here. * Remove the custom serialization for now. * Compilation cleanup * Reformat code * Fixing unit tests * Unify to use a single iterable * Backward compatibility for rolling upgrade * Minor check style. Cosmetic changes. * Rename length and millis to duration * CR feedback * Minor changes.	2017-02-25 01:02:29 -06:00
Gian Merlino	f21641f0dc	Fix over-optimistic log message. (#3963 ) "Wrote task log" could be logged before the output stream is flushed and closed, which could generate an error and not actually write the log.	2017-02-22 15:02:53 -08:00
Parag Jain	edb032b96d	add datasource in intermediate segment path (#3961 )	2017-02-22 16:31:00 -06:00
Gian Merlino	985203b634	Finalize fields in postaggs (#3957 ) * initial commits for finalizeFieldAccess #2433 * fix some bugs to run a query * change name of method Queries.verifyAggregations to Queries.prepareAggregations * add Uts * fix Ut failures * rebased to master * address comments and add a Ut for arithmetic post aggregators * rebased to the master * address the comment of injection within arithmetic post aggregator * address comments and introduce decorate() in the PostAggregator interface. * Address comments. 1. Implements getComparator in FinalizingFieldAccessPostAggregator and add Uts for it 2. Some minor changes like renaming a method name. * Fix a code style mismatch. * Rebased to the master	2017-02-21 16:32:14 -08:00
Gian Merlino	16ef513c7d	SQL: Add context and contextual functions to planner. (#3919 ) * SQL: Add context and contextual functions to planner. Added support for context parameters specified as JDBC connection properties or a JSON object for SQL-over-JSON-over-HTTP. Also added features that depend on context functionality: - Added CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP functions. - Added support for time zones other than UTC via a "timeZone" context. - Pass down query context to Druid queries too. Also some bug fixes: - Fix DATE handling, it was largely done incorrectly before. - Fix CAST(__time TO DATE) which should do a floor-to-day. - Fix non-equality comparisons to FLOOR(__time TO X). - Fix maxQueryCount property. * Pass down context to nested queries too.	2017-02-15 14:09:14 -08:00
Gian Merlino	78b0d134ae	Require Java 8 and include some Java 8 dependencies. (#3914 ) * Require Java 8 and include some Java 8 dependencies. - Upgrade Jetty to 9.3.16.v20170120. - Upgrade DataSketches to 0.8.4. - Bundle caffeine-cache by default. - Still target Java 7 when compiling base Druid classes. * Update cluster, quickstart docs. * Remove oraclejdk7 from travis.yml.	2017-02-14 12:51:51 -08:00
Akash Dwivedi	8854ce018e	File.deleteOnExit() (#3923 ) * Less use of File.deleteOnExit() * removed deleteOnExit from most of the tests/benchmarks/iopeon * Made IOpeon closable * Formatting. * Revert DeterminePartitionsJobTest, remove cleanup method from IOPeon	2017-02-13 15:12:14 -08:00
Parag Jain	1f263fe50b	alert when resetting offsets (#3931 ) * alert when resetting offsets * add more data to alerts	2017-02-13 13:49:24 -08:00

1 2 3 4

175 Commits