Commit Graph

7439 Commits

Author SHA1 Message Date
Gian Merlino 8030f1cb67 Be more respectful of maxRowsInMemory. (#3284)
- Appenderator: Respect maxRowsInMemory across all sinks.
- KafkaIndexTask: Respect maxRowsInMemory across all partitions.
2016-07-26 15:02:35 -06:00
Gian Merlino 9b5523add3 Reference counting, better error handling for resources in groupBy v2. (#3268)
Refcounting prevents releasing the merge buffer, or closing the concurrent
grouper, before the processing threads have all finished. The better
error handling prevents an avalanche of per-runner exceptions when grouping
resources are exhausted, by grouping those all up into a single merged
exception.
2016-07-27 01:59:02 +05:30
Charles Allen 188a4bc89a Revert "Optionally intern ServerInventoryView inventory objects. (#3238)" (#3286)
This reverts commit a931debf79.
Fixes #3283

The core issue here is that realtime nodes announce their size as 0, so a coordinator which interns the realtime version of the data segment will not be able to see the new sized announcement when handoff occurs.

This is caused by the `eauals` method on a `DataSegment` only evaluating the identifier. the `eauals` method *should* be correct for object equivalence, and things which need to check equivalence of some sub-portion of the object should do so explicitly.
2016-07-26 11:47:34 -07:00
Keuntae Park 95a58097e2 Hadoop InputRowParser for Orc file (#3019)
* InputRowParser to decode OrcStruct from OrcNewInputFormat

* add unit test for orc hadoop indexing

* update docs and fix test code bug

* doc updated

* resove maven dependency conflict

* remove unused imports

* fix returning array type from Object[] to correct primitive array type

* fix to support getDimension() of MapBasedRow : changing return type of orc list from array to list

* rebase and updated based on comments

* updated based on comments

* on reflecting review comments

* fix bug in typeStringFromParseSpec() and add unit test

* add license header
2016-07-26 09:42:56 -07:00
Erik Dubbelboer 76fabcfdb2 Fix #2782, Unit test failed for DruidProcessingConfigTest.testDeserialization (#3231)
On systems with only once processor this test fails.
2016-07-25 15:51:09 -07:00
kaijianding 3dc2974894 Add timestampSpec to metadata.drd and SegmentMetadataQuery (#3227)
* save TimestampSpec in metadata.drd

* add timestampSpec info in SegmentMetadataQuery
2016-07-25 15:45:30 -07:00
David Lim d5ed3f1347 change expected response from ACCEPTED to OK (#3280) 2016-07-23 19:48:30 -07:00
Gian Merlino b316cde554 Appenderator tests for disjoint query intervals. (#3281) 2016-07-23 19:48:15 -07:00
Charles Allen c58bbfa0c6 Intern DataSegments in SQLMetadataSegmentManager (#3267)
* Helps with heap pressure on coordinator
2016-07-21 16:46:08 -07:00
Jonathan Wei a42ccb6d19 Support filtering on long columns (including __time) (#3180)
* Support filtering on __time column

* Rename DruidPredicate

* Add docs for ValueMatcherFactory, add comment on getColumnCapabilities

* Combine ValueMatcherFactory predicate methods to accept DruidCompositePredicate

* Address PR comments (support filter on all long columns)

* Use predicate factory instead of composite predicate

* Address PR comments

* Lazily initialize long handling in selector/in filter

* Move long value parsing from InFilter to InDimFilter, make long value parsing thread-safe

* Add multithreaded selector/in filter test

* Fix non-final lock object in SelectorDimFilter
2016-07-20 17:08:49 -07:00
Navis Ryu cd7337fc8a Calculate max split size based on numMapTask in DatasourceInputFormat (#2882)
* Calculate max split size based on numMapTask

* updated docs & fixed possible ArithmeticException
2016-07-20 16:53:51 -07:00
Parag Jain fd798d32bc fix testSecuredGetServer ut (#3262) 2016-07-20 10:20:13 -07:00
Gian Merlino 06624c40c0 Share query handling between Appenderator and RealtimePlumber. (#3248)
Fixes inconsistent metric handling between the two implementations. Formerly,
RealtimePlumber only emitted query/segmentAndCache/time and query/wait and
Appenderator only emitted query/partial/time and query/wait (all per sink).

Now they both do the same thing:
- query/segmentAndCache/time, query/segment/time are the time spent per sink.
- query/cpu/time is the CPU time spent per query.
- query/wait/time is the executor waiting time per sink.

These generally match historical metrics, except segmentAndCache & segment
mean the same thing here, because one Sink may be partially cached and
partially uncached and we aren't splitting that out.
2016-07-19 22:15:13 -05:00
Gian Merlino 50db86cb17 Quickstart: Use hadoopyString for batch indexing instead of string. (#3263) 2016-07-19 10:18:10 -07:00
Nishant 47894c4eff add comment for default hadoop coordinates (#3257)
1) Modify CliHadoopIndexer to share constant from `TaskConfig.DEFAULT_DEFAULT_HADOOP_COORDINATES`
2) add comment to pom.xml as discussed in
https://github.com/druid-io/druid/pull/3044

fix name
2016-07-18 15:23:11 -07:00
Emanuele Cesena a9a73c5f71 Distribution: pull-deps compiled hadoop version (#3044) 2016-07-18 09:39:15 -07:00
Gian Merlino 13d8d96bc6 Update to guice-4.1.0. (#3222) 2016-07-18 08:08:43 -07:00
Gian Merlino dd4ec751d0 Update docs for working with Hadoop dependencies. (#3252)
- Attempt to make things clearer in general
- Point out that HDFS deep storage and MR jobs don't use the same loading mechanism
- Recommend using mapreduce.job.classloader = true when possible
2016-07-18 07:47:58 -05:00
Himanshu 3f82108d15 optionally enable coordinator auto kill tasks on all dataSources via dynamic config (#3250) 2016-07-17 18:47:52 -07:00
Nishant 7995818220 Increase test timeout to prevent failing on slow machines (#3224)
constantly timing out on one of slow build machines, increasing the
timeout fixed it.

Running io.druid.granularity.QueryGranularityTest
Tests run: 33, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.776
sec - in io.druid.granularity.QueryGranularityTest
2016-07-17 18:44:48 -07:00
Gian Merlino 90f5d8cd17 Fix path in cluster.md. (#3253) 2016-07-17 08:30:20 -07:00
Gian Merlino 6cd1f5375b Better harmonized dimensions for query metrics. (#3245)
All query metrics now start with toolChest.makeMetricBuilder, and all of
*those* now start with DruidMetrics.makePartialQueryTimeMetric. Also, "id"
moved to common code, since all query metrics added it anyway.

In particular this will add query-type specific dimensions like "threshold"
and "numDimensions" to servlet-originated metrics like query/time.
2016-07-14 11:55:51 -07:00
Hyukjin Kwon 55e7a52475 Replace deprecated usage for StringInputRowParser and JSONParseSpec (#3215) 2016-07-14 09:19:17 -07:00
Nishant a1715c8cda fix-3237 (#3244)
DruidBroker use FilteredServerInventoryView instead of
ServerInventoryView
2016-07-13 22:30:35 -07:00
Gian Merlino 6a03a0cfec Fix ingest/persist/backPressure docs. (#3243) 2016-07-13 21:56:28 -07:00
Gian Merlino c622a25236 BenchmarkDataGenerator: Don't generate timestamps at the end instant of the interval. (#3242)
Because timestamps at the end instant are not actually part of the interval. This
affected benchmark numbers, since it meant some data points would not be queried
(the interval for the query was based on getDataInterval) and also the
TimestampCheckingOffsets could not use the allWithinThreshold optimization.
2016-07-14 10:20:10 +05:30
Charles Allen a931debf79 Optionally intern ServerInventoryView inventory objects. (#3238) 2016-07-14 08:49:26 +05:30
Gian Merlino 3ab4a4efbc Fix formatting in granularities doc. (#3229) 2016-07-08 09:29:58 -07:00
Gian Merlino ea03906fcf Configurable compressRunOnSerialization for Roaring bitmaps. (#3228)
Defaults to true, which is a change in behavior (this used to be false and unconfigurable).
2016-07-08 10:24:19 +05:30
Charles Allen 5d9fd0a713 Migrate IndexerSQLMetadataStorageCoordinator.getUnusedSegmentsForInterval to streaming (#3043)
* Migrate IndexerSQLMetadataStorageCoordinator.getUnusedSegmentsForInterval to streaming
* Missed query from #2859

* Make inReadOnlyTransaction part of SQLMetadataConnector
2016-07-06 16:55:27 -07:00
Charles Allen 3f1681c16c Caffeine cache extension (#3028)
* Initial commit of caffeine cache

* Address code comments

* Move and fixup README.md a bit

* Improve caffeine readme information

* Cleanup caffeine pom

* Address review comments

* Bump caffeine to 2.3.1

* Bump druid version to 0.9.2-SNAPSHOT

* Make test not fail randomly.

See https://github.com/ben-manes/caffeine/pull/93#issuecomment-227617998 for an explanation

* Fix distribution and documentation

* Add caffeine to extensions.md

* Fix links in extensions.md

* Lexicographic
2016-07-06 15:42:54 -07:00
Gian Merlino b8a4f4ea7b DumpSegment: Add --dump bitmaps option. (#3221)
Also make --dump metadata respect --column.
2016-07-06 12:42:50 -07:00
Gian Merlino fdc7e88a7d Allow queries with no aggregators. (#3216)
This is actually reasonable for a groupBy or lexicographic topNs that is
being used to do a "COUNT DISTINCT" kind of query. No aggregators are
needed for that query, and including a dummy aggregator wastes 8 bytes
per row.

It's kind of silly for timeseries, but why not.
2016-07-06 20:38:54 +05:30
Charles Allen bfa5c05aaa Make global lookup cache introspector class public (#3199)
* Make global lookup cache introspector class public
* Fixes #3187

* Make KafkaLookupExtractorIntrospectionHandler a public static class
2016-07-01 15:50:57 -07:00
Himanshu e1313e4b90 add log msg when event recvr firehose buffer is full (#3209) 2016-07-01 17:35:30 -05:00
Fangjin Yang 8eeae2e844 remove bad docs on setting up clusters (#3188) 2016-07-01 15:41:40 -05:00
Parag Jain 99844dfeb5 remove need for tmp extensions dir (#3211)
correct lib path relative to base distribution dir
2016-07-01 12:55:57 -07:00
Bingkun Guo d2636d1a64 [pull-deps] If --clean flag is not set, skip creating root extension directories if they already exist. (#3130) 2016-07-01 11:18:57 -05:00
Charles Allen 8b7d9750ee Update extension docs for global lookup module (#3206) 2016-06-29 12:51:52 -07:00
Xavier Léauté 485e381387 remove datasource from hadoop output path (#3196)
fixes #2083, follow-up to #1702
2016-06-29 08:53:45 -07:00
Gian Merlino 4c9aeb7353 Revert "update druid console version (#3189)" (#3203)
This reverts commit 496b801bc3.
2016-06-29 08:29:57 -07:00
Jonathan Wei f3a3662133 Fix compile error in SearchBinaryFnTest (#3201) 2016-06-29 09:44:45 -05:00
David Lim b24425a280 update docs with new behavior (#3200) 2016-06-28 16:17:04 -07:00
jaehong choi efbcbf5315 Support alphanumeric sort in search query (#2593)
* support alphanumeric sort in search query

* address a comment about handling equals() and hashCode()

* address comments

* add Ut for string comparators

* address a comment about space indentations.
2016-06-28 15:06:18 -07:00
David Lim 1d40df4bb7 fix kafka consumer concurrent access during shutdown (#3193) 2016-06-28 13:23:17 -07:00
Xavier Léauté 496b801bc3 update druid console version (#3189) 2016-06-27 18:02:40 -07:00
du00cs bf53490d70 fix: no split file will throw IndexOutOfBounds Exception (#3179) 2016-06-26 12:50:18 -07:00
Hyukjin Kwon 45f553fc28 Replace the deprecated usage of NoneShardSpec (#3166) 2016-06-25 10:27:25 -07:00
Gian Merlino 4cc39b2ee7 Alternative groupBy strategy. (#2998)
This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.

Both of these are described in more detail in #2987.

There are two goals of this patch:

1. Make it possible for historical/realtime nodes to return larger groupBy
   result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
   columns, avoiding materialization.

This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
2016-06-24 18:06:09 -07:00
Nishant 0aa7d71ca5 Add doc link to eclipse formatting settings as well (#3131) 2016-06-24 15:27:50 -07:00