Commit Graph

165 Commits

Author SHA1 Message Date
Gian Merlino 5ab17668c0 CompressionUtils: Add support for decompressing xz, bz2, zip. (#5586)
Also switch various firehoses to the new method.

Fixes #5585.
2018-04-06 08:06:45 -07:00
Jonathan Wei 969342cd28
More error reporting and stats for ingestion tasks (#5418)
* Add more indexing task status and error reporting

* PR comments, add support in AppenderatorDriverRealtimeIndexTask

* Use TaskReport instead of metrics/context

* Fix tests

* Use TaskReport uploads

* Refactor fire department metrics retrieval

* Refactor input row serde in hadoop task

* Refactor hadoop task loader names

* Truncate error message in TaskStatus, add errorMsg to task report

* PR comments
2018-04-05 21:38:57 -07:00
Jonathan Wei 723f7ac550
Add support for task reports, upload reports to deep storage (#5524)
* Add support for task reports, upload reports to deep storage

* PR comments

* Better name for method

* Fix report file upload

* Use TaskReportFileWriter

* Checkstyle

* More PR comments
2018-04-02 12:10:56 -07:00
Kirill Kozlov 8878a7ff94 Replace guava Charsets with native java StandardCharsets (#5545) 2018-03-28 21:00:08 -07:00
Nathan Hartwell ea30c05355 Adding ParserSpec for Influx Line Protocol (#5440)
* Adding ParserSpec for Influx Line Protocol

* Addressing PR feedback

- Remove extraneous TODO
- Better handling of parse errors (e.g. invalid timestamp)
- Handle sub-millisecond timestamps

* Adding documentation for Influx parser

* Fixing docs
2018-03-26 14:28:46 -07:00
Charles Allen 58f110f7f8 Future-proof some Guava usage (#5414)
* Future-proof some Guava usage

* Use a java-util EmptyIterator instead of Guava's
* Change some of the guava future handling to do manual async
transforms. Guava changes transform into transformAsync by deprecating
transform in ONLY Guava 19. Then its gone in 20

* Use `Collections.emptyIterator()`

* Pretty formatting

* Make listenable future transforms a thing in default druid

* Format fix

* Add forbidden guava apis

* Make the ListenableFutrues.transformAsync have comments

* Undo intellij bad pattern matching in comments

* Futrues --> Futures

* Add empty iterators forbidding

* Fix extra `A`

* Correct method signature

* Address review comments

* Finish Gian review comments

* Proper syntax from https://github.com/policeman-tools/forbidden-apis/wiki/SignaturesSyntax
2018-03-20 08:59:33 -07:00
Roman Leventov 693e3575f9
Remove unused code and exception declarations (#5461)
* Remove unused code and exception declarations

* Address comments

* Remove redundant Exception declarations

* Make FirehoseFactoryV2.connect() to throw IOException again
2018-03-16 22:11:12 +01:00
Roman Leventov 6b158abe3f Enforce optimal IndexedInts iteration (#5456)
* Enforce optimal IndexedInts iteration

* Fix remaining suboptimal usages
2018-03-09 09:42:40 -08:00
Nishant Bangarwa 219e77aeac SQL compatible Null Handling Part - Expressions and Storage Changes (#5278)
* SQL compatible Null Handling Part - Expressions, Storage and Dimension Selector Changes

fix travis strict compilation

* fix teamcity error - remove unused method

* review comments

* review comments

* more comments

* review comments

* review comments

* Optimize isNull method

* Optimize isNull in ColumnarFloats/Longs/Doubles

* review comment - separate classes for null and non-null columns

fix intellij inspection

* remove unused import

* More Review comments

* improve comment

* More review comments

* fix checkstyle

* more review comments

* review comments.

fix javadoc links

remove Nullable from ConstantColumnValueSelector

* review comments.

* satisfy teamcity inspections
2018-02-21 13:27:26 +01:00
Dan Suzuki 472ba14dfe Support Map type in ORC extension (#5363)
* Support map type in orc extension.
Added getMapObject in OrcHadoopInputRowParser
Updated parse tests to parse map-type field in OrcHadoopInputRowParserTest

* changed from for-loop to foreach

* added resolution of column names when map types are exploded to several
columns. updated the document as well -- orc.md.

* Update orc.md

change from review
2018-02-15 13:03:15 -08:00
Spyros Kapnissis be38b18a85 Support Hadoop batch ingestion for druid-azure-extensions (#5221)
* Support Hadoop batch ingestion for druid-azure-extensions #5181

* Fix indentation issues

* Fix forbidden-apis violation

* Code & doc improvements for azure-extensions

* Rename version to binaryVersion where appropriate to avoid confusion
* Set default protocol to wasbs://, as recommended by the Azure docs
* Add link to Azure documentation for wasb(s):// path
* Remove any colons from the dataSegment.getVersion()
* Added test for dataSegment.getVersion colon is replaced
* Use StringUtils.format for String concatenation
* remove empty lines

* Remove unneeded StringUtils.format from log.info
2018-02-15 16:45:18 +01:00
QiuMM aa7aee53ce Opentsdb emitter extension (#5380)
* opentsdb emitter extension

* doc for opentsdb emitter extension

* update opentsdb emitter doc

* add the ms unit to the constant name

* add a configurable event limit

* fix version to 0.13.0-SNAPSHOT

* using a thread to consume metric event

* rename method and parameter
2018-02-13 13:10:22 -08:00
Roman Leventov e64ffb10c2 Standartize on using Integer.BYTES instead of Ints.BYTES from Guava, same for other primitives (#5366) 2018-02-07 13:24:30 -08:00
Gian Merlino 7e02408510 Update versions to 0.13.0-SNAPSHOT. (#5323) 2018-02-02 12:06:38 -06:00
Kevin Conaway a5ba31c230 Fix graphite whitelist converter to handle array dimensions (#5269)
* Fix graphite whitelist converter to handle array dimensions

* Fix ambari whitelist converter to handle array dimensions
2018-01-29 21:46:46 +05:30
Jonathan Wei 80419752b5 Add metamx emitter, http clients, and metrics packages to druid java-util (#5289)
* Add metamx java-util emitter, http clients, and metrics packages to druid java-util

* Remove metamx java-util from pom.xml files

* Checkstyle fixes

* Import fix

* TeamCity inspection fixes

* Use slf4j, move some version defs to master pom.xml

* Use parent jvm-attach-api and maven-surefire-plugin versions

* Add ] to log msg, suppress inspection
2018-01-24 22:10:36 +01:00
Roman Leventov 61e6878afd Check Javadoc reference integrity (#5279) 2018-01-22 13:51:28 -08:00
Roman Leventov a346bbc6f3 Enforce spacing around foreach colon with Checkstyle (#5271) 2018-01-22 11:48:51 -08:00
Roman Leventov 8877ce38d6
Enforce modifier order with Checkstyle (#5246) 2018-01-11 09:50:42 +01:00
Jihoon Son 5d0619f5ce Support retrying for PrefetchableTextFilesFirehoseFactory when prefetch is disabled (#5162)
* Add RetryingInputStream

* unnecessary exception

* fix PrefetchableTextFilesFirehoseFactoryTest

* Fix retrying on connection reset

* fix start offset

* fix checkstyle

* fix check connection reset

* address comments

* fix compile

* address comments

* address comments
2018-01-10 17:37:19 +01:00
Roman Leventov 579f9fbedf Add IndexedInts.debugToString() and AbstractIndex.toString(); Add Sequence.toList() and limit() (#5175)
* Add IndexedInts.debugToString() and AbstractIndex.toString()

* Fix AppenderatorTest
2018-01-04 09:56:47 +09:00
David Lim a7967ade4d Support replaceExisting parameter for segments pushers (#5187)
* support replaceExisting parameter for segments pushers

* code review changes

* code review changes
2018-01-03 16:13:21 -08:00
Jihoon Son 9199d61389 Automatic pendingSegments cleanup (#5149)
* PendingSegments cleanup

* fix build

* address comments

* address comments

* fix potential npe

* address comments

* fix build

* fix test

* fix test
2017-12-20 14:46:34 -08:00
Roman Leventov f18eba50ee Remove Aggregator.reset() (#5177) 2017-12-19 14:09:17 -08:00
Roman Leventov 5787d04fad Bump Druid version to 0.12.0 (#5138) 2017-12-15 07:37:01 -08:00
Gian Merlino 4f5e2b4549 Fix some unemitted alerts. (#5141) 2017-12-06 18:37:01 -08:00
Parag Jain 7c01f77b04 Parse Batch support (#5081)
* add parseBatch and deprecate parse method in InputRowParser

add addAll method, skip max rows in memory check for it

remove parse method from implemetations

transform transformers

add string multiplier input row parser

fix withParseSpec

fix kafka batch indexing

fix isPersistRequired

comments

* add unit test

* make persist async

* review comments
2017-12-04 16:06:16 -06:00
chaoqiang b34d471aa2 Fix StatsD Emitter with blank character (#5044)
* fix equalDistribution worker select strategy

* replace anonymous Comparator

* keep previous version sorting comment

* fix code style

* update comment

* move JsonProperty

* fix statsD emitter with blank character
2017-11-16 11:12:24 -08:00
Roman Leventov 3541b7544b Prohibit and remove unused declarations in the processing module (#4930)
* Prohibit and remove unused declarations in the processing module

* Fix tests

* Fix integration tests

* Suppress unused

* Try to remove SuppressWarnings unused in VirtualColumn

* Remove reset 'false positives'

* Annotate CliCommandCreator as ExtensionPoint

* Unused import warning instead of error in IntelliJ

* Fixes

* Add comment

* Fix AzureBlob

* Fix CloudFilesBlob

* Address comments

* Add Project SDK section to INTELLIJ_SETUP.md

* Fix image
2017-11-09 09:27:27 -08:00
Roman Leventov 5eb08c27cb Add Emitter monitoring (#4973)
* Add Emitter monitoring

* Fix typo

* Fixes

* testing new emitter

* Fix failed test (#71)

* testing new emitter

* fix on failed test

* Remove emitter's readTimeout from docs

* Update docs

* Add HttpEmittingMonitor

* Update java-util to 1.3.2
2017-11-03 21:27:57 -06:00
Gian Merlino 1e0abcde87 Fix improper rounding of ORC decimals. (#5031) 2017-11-03 09:16:02 -07:00
Fokko Driesprong 21e1bf68f6 Update Avro to 1.8.0 (#5015)
The druid parquet extensions uses Avro 1.8 and therefore it is
required to update the Avro version itself also to 1.8 to avoid
classpath conflicts
2017-11-02 09:08:41 -06:00
Gian Merlino 0ce406bdf1
Introduce "transformSpec" at ingest-time. (#4890)
* Introduce "transformSpec" at ingest-time.

It accepts a "filter" (standard query filter object) and "transforms" (a
list of objects with "name" and "expression"). These can be used to do
filtering and single-row transforms without need for a separate data
processing job.

The "expression" fields use the same expression language as other
expression-based feature.

* Remove forbidden api.

* Fix compile error.

* Fix tests.

* Some more changes.

- Add nullable annotation to Firehose.nextRow.
- Add tests for index task, realtime task, kafka task, hadoop mapper,
  and ingestSegment firehose.

* Fix bad merge.

* Adjust imports.

* Adjust whitespace.

* Make Transform into an interface.

* Add missing annotation.

* Switch logger.

* Switch logger.

* Adjust test.

* Adjustment to handling for DatasourceIngestionSpec.

* Fix test.

* CR comments.

* Remove unused method.

* Add javadocs.

* More javadocs, and always decorate.

* Fix bug in TransformingStringInputRowParser.

* Fix bad merge.

* Fix ISFF tests.

* Fix DORC test.
2017-10-30 17:38:52 -07:00
Roman Leventov dc7cb117a1 Refactor ColumnSelectorFactory; Rely on ColumnValueSelector's polymorphism (#4886)
* Refactor ColumnSelectorFactory; Rely on ColumnValueSelector's polymorphism

* Fix MapVirtualColumn.makeColumnValueSelector()

* Minor fixes

* Fix IndexGeneratorCombinerTest

* DimensionSelector to return zeros when treated as numeric ColumnValueSelector

* Fix IncrementalIndexTest

* Fix IncrementalIndex.makeColumnSelectorFactory()

* Optimize MapBasedRow.getMetric()

* Fix VarianceAggregatorTest

* Simplify IncrementalIndex.makeColumnSelectorFactory()

* Address comments

* More comments

* Test
2017-10-13 21:44:17 -05:00
Jihoon Son 8d9902831e Refactoring PrefetchableTextFilesFirehoseFactory (#4836)
* Refactoring prefetchable firehose

* Fix to read cache when prefetch is disabled

* More tests

* Cleanup codes

* Add Fetcher

* Fix test failure

* Count file size

* Fix test

* rename generic parameter

* address comments

* address comments

* reuse buffer

* move Execs to java-util

* use execs

* Fix build
2017-10-13 21:39:28 -05:00
Jihoon Son 675c6c00dd Add checkstyle and intellij rule to prohibit unnecessary qualifiers in interfaces (#4958)
* add checkstyle and intellij rule

* fix tc fail
2017-10-13 07:56:19 -07:00
Jihoon Son d95915f8d2 Implement get methods for PrefetchableFirehose (#4948) 2017-10-12 16:14:45 +09:00
Jihoon Son 56fb11ce0b Lazy initialization for JavaScript functions (#4871)
* Lazy initialization of JavaScript functions

* Fix test failure

* Fix thread-safety and postpone js conf check

* Fix test fail

* Fix test

* Fix KafkaIndexTaskTest

* Move config check
2017-10-10 21:52:42 -07:00
Gian Merlino 1f2074c247 Bump versions in master to 0.11.1-SNAPSHOT. (#4878)
* Bump versions in master to 0.11.1-SNAPSHOT.

* Missed a few.
2017-09-28 17:09:51 -05:00
Goh Wei Xiang 2c30d5ba55 Add org.joda.time.DateTime.parse() to forbidden APIs (#4857)
* Added org.joda.time.DateTime#(java.lang.String) to forbidden API.

* Added org.joda.time.DateTime#(java.lang.String, org.joda.time.format.DateTimeFormatter) to forbidden API.

* Add additional APIs that may create DateTime with default time zone

* Add helper function that accepts formatter to parse String.

* Add additional forbidden APIs

* Replace existing usage of forbidden APIs

* Use wrapper class to enforce Chronology on DateTimeFormatter.

* Creates constant UtcFormatter for constant ISODateTimeFormat.
2017-09-27 17:46:44 -05:00
Gian Merlino bf8fd4c203 Add flattenSpec support to the Avro parser. (#4832)
* Add flattenSpec support to the Avro parser.

Also:

- Refactor the JSONPathParser a bit so it can share flattening code
  with Avro (see ObjectFlatteners).
- Remove the JSONParser. It was only used in two places: by
  UriNamespaceExtractor, and as a base for JSONToLowerParser. Migrated
  the former to JSONPathParser and made the latter a standalone.
- Move GenericRecordAsMap to the Parquet extension, since the Avro
  extension no longer uses it.

* Fix indentation.

* Fix equals/hashCode.
2017-09-26 09:26:06 -07:00
Jonathan Wei 09fcb75583 Add RequestLogEvent emitters config to graphite-emitter (#4678)
* Add RequestLogEvent emitters config to graphite-emitter

* eagerly compute emitter list

* use lambdas

* checkstyle
2017-09-22 06:14:32 -07:00
Roman Leventov e267f3901b Enforce Indentation with Checkstyle (#4799) 2017-09-21 13:06:48 -07:00
Roman Leventov a9d8539802 Remove IndexedInts.iterator() (#4811)
* Remove IndexedInts.iterator()

* Retain IndexedInts.iterator(), but don't extend Iterable

* Add BitmapValues

* Fix tests
2017-09-20 21:25:52 -07:00
Roman Leventov 88e9a80636 Rename ObjectValueSelector.get() to getObject(); Add getObject() and classOfObject() to ColumnValueSelector (#4801) 2017-09-19 14:47:20 -05:00
Roman Leventov 3f92184dd8 Inspection fixes (#4809) 2017-09-15 17:48:29 -07:00
Gian Merlino eb6791a60c TimestampAggregator: Avoid cross-classloader access of package-private field. (#4788)
* TimestampAggregator: Avoid cross-classloader access of package-private field.

* Simplify.

* Remove unused import.
2017-09-13 09:52:01 -07:00
Gian Merlino 2ce8123bdb Move scan-query from a contrib extension into core. (#4751)
* Move scan-query from a contrib extension into core.

Based on a proposal at: https://groups.google.com/d/topic/druid-development/ME_OatUDnbk/discussion

This patch also adds support for virtual columns to the Scan query,
and updates Druid SQL to use Scan instead of Select.

This patch also makes some behavioral changes to handling of the __time
column. In particular, it is now is returned as "__time" rather than
"timestamp"; it is no longer included if you do not specifically ask for
it in your "columns"; and it is returned as a long rather than a string.

Users can revert time handling to the legacy extension behavior by
setting "legacy" : true in their queries, or setting the property
druid.query.scan.legacy = true. This is meant to provide a migration
path for users that were formerly using the contrib extension.

* Adjustments from review.

* Add back Select query.

* Adjust SQL docs.

* Restore SelectQuery link.
2017-09-13 09:51:24 -07:00
Bartosz Ługowski 8dddccc687 Graphite emitter - add plaintext protocol (#4265)
* Graphite emitter - add plaintext protocol. Configurable option of replacing slash to dot in metric name.

* Graphite emitter - fix misspelling in docs.

* Graphite emitter - extend docs.

* Graphite emitter - fix code style.
2017-08-29 06:23:06 -07:00
Roman Leventov 4d109a358a Refactoring of Storage Adapters (#4710)
* Factor QueryableIndexColumnSelectorFactory and IncrementalIndexColumnSelectorFactory out of QueryableIndexStorageAdapter and IncrementalIndexStorageAdapter; Add Offset.getBaseReadableOffset(); Remove OffsetHolder interface; Replace Cursor extends ColumnSelectorFactory with composition; Reduce indirection in ColumnValueSelectors created by QueryableIndexColumnSelectorFactory

* Don't override clone() in FilteredOffset (the prev. implementation was broken); Some warnings fixed

* Simplify Cursors in QueryableIndexStorageAdapter

* Address comments

* Remove unused and unimplemented methods from GenericColumn interface

* Comments
2017-08-28 18:07:31 -07:00