Commit Graph

60 Commits

Author SHA1 Message Date
Fokko Driesprong 4c709ddbc1 Bump Apache Parquet to 1.10.1 (#7645)
https://github.com/apache/parquet-mr/blob/master/CHANGES.md#version-1101
2019-05-12 14:38:33 -07:00
Jonathan Wei fafbc4a80e
Set version to 0.15.0-incubating-SNAPSHOT (#7014) 2019-02-07 14:02:52 -08:00
Jonathan Wei 8bc5eaa908
Set version to 0.14.0-incubating-SNAPSHOT (#7003) 2019-02-04 19:36:20 -08:00
Clint Wylie ccfd1244d1 fix parquet parse performance issue (#6833)
* check that value is present before conversion to prevent silent, expensive exception and fix another bug

* cleanup

* now with less parenthesis
2019-01-10 09:18:57 -08:00
Clint Wylie 4ec068642d move parquet extension input formats up a level to `org.apache.druid.data.input.parquet.DruidParquetInputFormat` for `parquet` and `org.apache.druid.data.input.parquet.DruidParquetAvroInputFormat` for `parquet-avro` (#6727) 2018-12-13 16:33:42 -08:00
Clint Wylie 8f8a569aa2 faster flattening for non-existent paths (#6654)
* faster flattening for non-existent properties to circumvent upstream json-path issue

* fix json provider

* revert to using null instead of undefined
2018-11-27 14:14:11 -08:00
Roman Leventov 87b96fb1fd
Add checkstyle rules about imports and empty lines between members (#6543)
* Add checkstyle rules about imports and empty lines between members

* Add suppressions

* Update Eclipse import order

* Add empty line

* Fix StatsDEmitter
2018-11-20 12:42:15 +01:00
Mingming Qiu 93b0d58571 optimize input row parsers (#6590)
* optimize input row parsers

* address comments
2018-11-16 11:48:32 +08:00
David Lim afb239b17a add missing license headers, in particular to MD files; clean up RAT … (#6563)
* add missing license headers, in particular to MD files; clean up RAT exclusions

* revert inadvertent doc changes

* docs

* cr changes

* fix modified druid-production.svg
2018-11-13 09:38:37 -08:00
Clint Wylie 1224d8b746 overhaul 'druid-parquet-extensions' module, promoting from 'contrib' to 'core' (#6360)
* move parquet-extensions from contrib to core, adds new hadoop parquet parser that does not convert to avro first and supports flattenSpec and int96 columns, add support for flattenSpec for parquet-avro conversion parser, much test with a bunch of files lifted from spark-sql

* fix avro flattener to support nullable primitives for auto discovery and now only supports primitive arrays instead of all arrays

* remove leftover print

* convert micro timestamp to millis

* checkstyle

* add ignore for .parquet and .parq to rat exclude

* fix legit test failure from avro flattern behavior change

* fix rebase

* add exclusions to pom to cut down on redundant jars

* refactor tests, add support for unwrapping lists for parquet-avro, review comments

* more comment

* fix oops

* tweak parquet-avro list handling

* more docs

* fix style

* grr styles
2018-11-05 21:33:42 -08:00