Commit Graph

231 Commits

Author SHA1 Message Date
Jihoon Son 1ad898bde2
Use the official aws-sdk instead of jet3t (#5382)
* Use the official aws-sdk instead of jet3t

* fix compile and serde tests

* address comments and fix test

* add http version string

* remove redundant dependencies, fix potential NPE, and fix test

* resolve TODOs

* fix build

* downgrade jackson version to 2.6.7

* fix test

* resolve the last TODO

* support proxy and endpoint configurations

* fix build

* remove debugging log

* downgrade hadoop version to 2.8.3

* fix tests

* remove unused log

* fix it test

* revert KerberosAuthenticator change

* change hadoop-aws scope to provided in hdfs-storage

* address comments

* address comments
2018-03-21 15:36:54 -07:00
Christoph Hösler 34f655599d Let MySQLConnector accept all UTF charsets and recommend utf8mb4 (#5411)
* Let MySQLConnector accept all UTF charsets and recommend utf8mb4

* Fix regex and remove newline in log statement
2018-03-13 01:16:10 -07:00
Parag Jain fba13d8978 time based checkpointing for Kafka Indexing Service (#5255)
* time based checkpointing

* add test and fix issue

* fix comments

* fix formatting

* update docs
2018-02-15 20:57:02 -08:00
David Lim 20a3164180 Support for router forwarding requests to active coordinator/overlord (#5369)
* allow router to forward requests to coordinator and overlord

* fix forbidden API

* more forbidden api fixes

* code review changes
2018-02-15 14:38:58 -08:00
Dan Suzuki 472ba14dfe Support Map type in ORC extension (#5363)
* Support map type in orc extension.
Added getMapObject in OrcHadoopInputRowParser
Updated parse tests to parse map-type field in OrcHadoopInputRowParserTest

* changed from for-loop to foreach

* added resolution of column names when map types are exploded to several
columns. updated the document as well -- orc.md.

* Update orc.md

change from review
2018-02-15 13:03:15 -08:00
Parag Jain b9b3be6965 fix segment info in Kafka indexing service docs (#5390)
* fix segment info in Kafka indexing service docs

* review updates
2018-02-15 09:57:30 -08:00
QiuMM aa7aee53ce Opentsdb emitter extension (#5380)
* opentsdb emitter extension

* doc for opentsdb emitter extension

* update opentsdb emitter doc

* add the ms unit to the constant name

* add a configurable event limit

* fix version to 0.13.0-SNAPSHOT

* using a thread to consume metric event

* rename method and parameter
2018-02-13 13:10:22 -08:00
Gian Merlino ed47a1e1a9
Lookups: Inherit "injective" from registered lookups, improve docs. (#5316)
Code changes:
- In the lookup-based extractionFns, inherit injective property from
  the lookup itself if not specified.

Doc changes:
- Add a "Query execution" section to the lookups doc explaining how
  injective lookups and their optimizations work.
- Remove scary warnings against using registeredLookup extractionFns.
  They are necessary and important since they work with filters and
  function cascades -- two things that the dimension specs do not do.
  They deserve to be first class citizens.
- Move the "registeredLookup" fn above the "lookup" fn. It's probably
  more commonly used, so the docs read better this way.
2018-02-01 18:30:19 -08:00
Jonathan Wei 80419752b5 Add metamx emitter, http clients, and metrics packages to druid java-util (#5289)
* Add metamx java-util emitter, http clients, and metrics packages to druid java-util

* Remove metamx java-util from pom.xml files

* Checkstyle fixes

* Import fix

* TeamCity inspection fixes

* Use slf4j, move some version defs to master pom.xml

* Use parent jvm-attach-api and maven-surefire-plugin versions

* Add ] to log msg, suppress inspection
2018-01-24 22:10:36 +01:00
Fokko Driesprong cc32640642 Update the example of the dimensionsSpec (#5293)
The example was outdated with the dateSpec
2018-01-24 11:28:54 -08:00
Jihoon Son 241efafbb2
Automatic compaction by coordinators (#5102)
* Automatic compaction by coordinator

* add links

* skip compaction for very recent segments if they are small

* fix finding search interval

* fix finding search interval

* fix TimelineHolder iteration

* add test for newestSegmentFirstPolicy

* add CompactionSegmentIterator

* add numTargetCompactionSegments

* add missing config

* fix skipping huge shards

* fix handling large number of segments per shard

* fix test failure

* change recursive call to loop

* fix logging

* fix build

* fix test failure

* address comments

* change dataSources type

* check running pendingTasks at each run

* fix test

* address comments

* fix build

* fix test

* address comments

* address comments

* add doc for segment size optimization

* address comment
2018-01-13 13:52:37 +09:00
Atul Mohan 3cc4a0ab19 Support for encryption of MySQL connections (#5122)
* Encrypting MySQL connections

* Update docs

* Make verifyServerCertificate a configurable parameter

* Change password parameter and doc update

* Make server certificate verification disabled by default

* Update tostring

* Update docs

* Add check for trust store passwords

* Add warning for null password
2018-01-10 11:33:54 -08:00
Jonathan Wei 02544f9197 Add missing auth doc links (#5224) 2018-01-05 16:23:13 -06:00
Nishant Bangarwa 494e0b79ed Allow configuring header size for druid requests (#5174)
* Allow configuring header size for druid requests

* fix configuration name in doc.

* add more info to docs.

* Add info to kerberos doc.
2017-12-20 18:51:40 -08:00
Jonathan Wei f48c9d7be1
Basic auth extension (#5099)
* Basic auth extension

* Add auth configuration integration test

* Fix missing authorizerName property

* PR comments

* Fix missing @JsonProperty annotation

* PR comments

* more PR comments
2017-12-14 10:36:04 -08:00
Roman Leventov a7a6a0487e Replace IOPeon with SegmentWriteOutMedium; Improve buffer compression (#4762)
* Replace IOPeon with OutputMedium; Improve compression

* Fix test

* Cleanup CompressionStrategy

* Javadocs

* Add OutputBytesTest

* Address comments

* Random access in OutputBytes and GenericIndexedWriter

* Fix bugs

* Fixes

* Test OutputBytes.readFully()

* Address comments

* Rename OutputMedium to SegmentWriteOutMedium and OutputBytes to WriteOutBytes

* Add comments to ByteBufferInputStream

* Remove unused declarations
2017-12-04 18:04:27 -08:00
chaoqiang 50140ce820 StatsD Emitter Doc on blankHolder (#5101)
* fix equalDistribution worker select strategy

* replace anonymous Comparator

* keep previous version sorting comment

* fix code style

* update comment

* move JsonProperty

* fix statsD emitter with blank character

* Add blankHolder doc On statsD monitor
2017-11-18 12:00:47 -08:00
Parag Jain cb03efeb14 Kafka Index Task that supports Incremental handoffs (#4815)
* Kafka Index Task that supports Incremental handoffs
- Incrementally handoff segments when they hit maxRowsPerSegment limit
- Decouple segment partitioning from Kafka partitioning, all records from consumed partitions go to a single druid segment
- Support for restoring task on middle manager restarts by check pointing end offsets for segments

* take care of review comments

* make getCurrentOffsets call async, keep track of publishing sequence, review comments

* fix setEndoffset duplicate request handling, formatting

* fix unit test

* backward compatibility

* make AppenderatorDriverMetadata backwards compatible

* add unit test

* fix deadlock between persist and push executors in AppenderatorImpl

* fix formatting

* use persist dir instead of work dir

* review comments

* fix deadlock

* actually fix deadlock
2017-11-17 16:05:20 -06:00
Jonathan Wei 6840eabd87
Add Router connection balancers for Avatica queries (#4983)
* Add Router connection balancers for Avatica queries

* PR comments

* Adjust test bounds

* PR comments

* Add doc comments

* PR comments

* PR comment

* Checkstyle fix
2017-11-01 14:01:13 -07:00
elloooooo 52a162e302 define earlyMessegeRejectPeriod as the period after the taskduration (#4990) 2017-10-27 01:13:46 +05:30
chunghochen 0614b92df1 adding new post aggregators for test statistics to druid-stats extension (#4532)
* adding new post aggregators of test stats to druid-stats extension

* changes to address code review comments

* fix checkstyle violations using druid_intellij_formatting.xml after merge upstream/master

* add @Override annotation per CI log

* make changes per review comments/discussions

* remove some blocks per review comments
2017-10-09 23:43:27 -07:00
Gian Merlino bf8fd4c203 Add flattenSpec support to the Avro parser. (#4832)
* Add flattenSpec support to the Avro parser.

Also:

- Refactor the JSONPathParser a bit so it can share flattening code
  with Avro (see ObjectFlatteners).
- Remove the JSONParser. It was only used in two places: by
  UriNamespaceExtractor, and as a base for JSONToLowerParser. Migrated
  the former to JSONPathParser and made the latter a standalone.
- Move GenericRecordAsMap to the Parquet extension, since the Avro
  extension no longer uses it.

* Fix indentation.

* Fix equals/hashCode.
2017-09-26 09:26:06 -07:00
Roman Leventov b56a907145 Add namespace extraction thread config (#4833) 2017-09-25 09:52:36 -07:00
Charles Allen a6470c1d03 Move caffeine out of extension and make it the default cache implementation. (#4810)
* Move caffeine out of extension.

* Remove `JsonTypeName` from the class itself

* Fix bad docs

* Fix distribution pom

* Fix unused import

* Make caffeine default

* Address code comments

* Add more description around the jre version in the readme

* Add suggested comments
2017-09-22 10:46:55 -07:00
Jonathan Wei 09fcb75583 Add RequestLogEvent emitters config to graphite-emitter (#4678)
* Add RequestLogEvent emitters config to graphite-emitter

* eagerly compute emitter list

* use lambdas

* checkstyle
2017-09-22 06:14:32 -07:00
Jonathan Wei 164c73f2b2 Fix kerberos authenticator docs (#4822) 2017-09-19 14:32:22 -05:00
Jonathan Wei c2a0e753b6 Extension points for authentication/authorization (#4271)
* Extension points for authentication/authorization

* Address some PR comments

* Authorization result caching

* Add unit tests for SecuritySanityCheckFilter and PreResponseAuthorizationCheckFilter

* Use Set for auth caching, close outputstreams in filters

* Don't close output stream on success in sanity check filter

* Add ConfigResourceFilter to coordinator lookups

* Fix filtering authorization check for empty resource list

* HttpClient users must explicitly escalate the client

* Remove response modification from PreResponseAuthorizationCheckFilter

* Remove extraneous pom.xml

* Fix unit test

* Better lifecycle management

* Rename AuthorizationManager to Authorizer

* Fix authorization denials for empty supervisor list

* Address some PR comments

* Address more PR comments

* Small cleanup

* Add Jetty HttpClient wrapper to Authenticator

* Remove Authorizer start/stop

* Restore immutable context map in DruidConnection, UT fix

* Fix/update docs

* Add authorization checks to EventReceiverFirehose

* Fix router authorization check failure, restore PreResponseAuthorizationFilter changes

* Compile fixes

* Test fixes

* Update Authenticator/Authorizer doc comments

* Merge fixes

* PR comments

* Fix test

* Fix IT

* More PR comments

* PR comments

* SSL fix
2017-09-15 23:45:48 -07:00
Gian Merlino 2ce8123bdb Move scan-query from a contrib extension into core. (#4751)
* Move scan-query from a contrib extension into core.

Based on a proposal at: https://groups.google.com/d/topic/druid-development/ME_OatUDnbk/discussion

This patch also adds support for virtual columns to the Scan query,
and updates Druid SQL to use Scan instead of Select.

This patch also makes some behavioral changes to handling of the __time
column. In particular, it is now is returned as "__time" rather than
"timestamp"; it is no longer included if you do not specifically ask for
it in your "columns"; and it is returned as a long rather than a string.

Users can revert time handling to the legacy extension behavior by
setting "legacy" : true in their queries, or setting the property
druid.query.scan.legacy = true. This is meant to provide a migration
path for users that were formerly using the contrib extension.

* Adjustments from review.

* Add back Select query.

* Adjust SQL docs.

* Restore SelectQuery link.
2017-09-13 09:51:24 -07:00
Bartosz Ługowski 8dddccc687 Graphite emitter - add plaintext protocol (#4265)
* Graphite emitter - add plaintext protocol. Configurable option of replacing slash to dot in metric name.

* Graphite emitter - fix misspelling in docs.

* Graphite emitter - extend docs.

* Graphite emitter - fix code style.
2017-08-29 06:23:06 -07:00
Gian Merlino 9fbfc1be32 Add @ExtensionPoint and @PublicApi annotations. (#4433)
* Add @ExtensionPoint and @PublicApi annotations.

* Clean up wording.

* Remove unused import.

* Remove unused imports.

* Only types can be extension points.

* Adjust annotations some more.

* Remove unused import.

* Make ServletFilterHolder an extension point.

* Add a couple extension points, and update docs.
2017-08-28 14:50:58 -07:00
QiuMM 59a48a560a Redis cache extension doc (#4702)
* Redis cache extension doc

* link redis cache doc in extensions.md
2017-08-24 09:53:51 -05:00
Yuewen Wang c821bc9a5a Implement "earlyMessageRejectionPeriod" config discussed in issue #4599 (#4607)
* Implement "earlyMessageRejectionPeriod" config discussed in issue #4599
    * implement the logics of this param
    * Added doc of this config
    * Added unit tests of it

* Update KafkaSupervisor.java

ameliorate comment

* fix format

* fix bug when rebasing
2017-08-11 09:12:08 +09:00
Peter Cunningham ede7cf9eef Added support for where clauses to JDBC lookups. (#4643)
* Added support for where clauses to filter lookup values on ingestion.

Added a filter field to the JDBC lookups that is used to generate a
where clause so that only rows matching the filter value will be
brought into Druid. Example being filter="SOMECOLUMN=1"

* Required changes based on code review.

* Required changes based on code review.

* Added support for where clauses to filter lookup values on ingestion.

Added a filter field to the JDBC lookups that is used to generate a
where clause so that only rows matching the filter value will be
brought into Druid. Example being filter="SOMECOLUMN=1"

* Updates based on code review, mainly formatting and small refactor of
the buildLookupQuery method.

* Fixed broken buildLookupQuery method

* Removed empty line.

* Updates per review comments
2017-08-09 10:47:46 -07:00
Parag Jain 6e2f78f552 TLS support (#4270) 2017-07-06 17:40:12 -07:00
Roman Leventov 2fa4b10145 More fine-grained DI for management node types. Don't allocate processing resources on Router (#4429)
* Remove DruidProcessingModule, QueryableModule and QueryRunnerFactoryModule from DI for coordinator, overlord, middle-manager. Add RouterDruidProcessing not to allocate processing resources on router

* Fix examples

* Fixes

* Revert Peon configs and add comments

* Remove qualifier
2017-06-27 22:58:01 -07:00
Roman Leventov 05d58689ad Remove the ability to create segments in v8 format (#4420)
* Remove ability to create segments in v8 format

* Fix IndexGeneratorJobTest

* Fix parameterized test name in IndexMergerTest

* Remove extra legacy merging stuff

* Remove legacy serializer builders

* Remove ConciseBitmapIndexMergerTest and RoaringBitmapIndexMergerTest
2017-06-26 13:21:39 -07:00
Fokko Driesprong ff501e8f13 Add Date support to the parquet reader (#4423)
* Add Date support to the parquet reader

Add support for the Date logical type. Currently this is not supported. Since the parquet
date is number of days since epoch gets interpreted as seconds since epoch, it will fails
on indexing the data because it will not map to the appriopriate bucket.

* Cleaned up code and tests

Got rid of unused json files in the examples, cleaned up the tests by
using try-with-resources. Now get the filenames from the json file
instead of hard coding them and integrated general improvements from
the feedback provided by leventov.

* Got rid of the caching

Remove the caching of the logical type of the time dimension column
and cleaned up the code a bit.
2017-06-22 15:56:08 -05:00
Yuya Fujiwara 152d4e89ab Fix typo in the avro.md. (#4370) 2017-06-06 07:14:08 -07:00
David Lim 13ecf90923 Report Kafka lag information in supervisor status report (#4314)
* refactor lag reporting and report lag at status endpoint

* refactor offset reporting logic to fetch offsets periodically vs. at request time

* remove JavaCompatUtils

* code review changes

* code review changes
2017-06-05 13:26:25 -07:00
Jihoon Son 1150bf7a2c Refactoring Appenderator Driver (#4292)
* Refactoring Appenderator

1) Added publishExecutor and handoffExecutor for background publishing and handing segments off
2) Change add() to not move segments out in it

* Address comments

1) Remove publishTimeout for KafkaIndexTask
2) Simplifying registerHandoff()
3) Add increamental handoff test

* Remove unused variable

* Add persist() to Appenderator and more tests for AppenderatorDriver

* Remove unused imports

* Fix strict build

* Address comments
2017-06-02 07:09:11 +09:00
Kenji Noguchi 3400f601db Protobuf extension (#4039)
* move ProtoBufInputRowParser from processing module to protobuf extensions

* Ported PR #3509

* add DynamicMessage

* fix local test stuff that slipped in

* add license header

* removed redundant type name

* removed commented code

* fix code style

* rename ProtoBuf -> Protobuf

* pom.xml: shade protobuf classes, handle .desc resource file as binary file

* clean up error messages

* pick first message type from descriptor if not specified

* fix protoMessageType null check. add test case

* move protobuf-extension from contrib to core

* document: add new configuration keys, and descriptions

* update document. add examples

* move protobuf-extension from contrib to core (2nd try)

* touch

* include protobuf extensions in the distribution

* fix whitespace

* include protobuf example in the distribution

* example: create new pb obj everytime

* document: use properly quoted json

* fix whitespace

* bump parent version to 0.10.1-SNAPSHOT

* ignore Override check

* touch
2017-05-30 13:11:58 -07:00
Jihoon Son 11b7b1bea6 Add support for HttpFirehose (#4297)
* Add support for HttpFirehose

* Fix document

* Add documents
2017-05-25 16:13:04 -05:00
Jihoon Son 733dfc9b30 Add PrefetchableTextFilesFirehoseFactory for cloud storage types (#4193)
* Add PrefetcheableTextFilesFirehoseFactory

* fix comment

* exception handling

* Fix wrong json property

* Remove ReplayableFirehoseFactory and fix misspelling

* Defer object initialization

* Add a temporaryDirectory parameter to FirehoseFactory.connect()

* fix when cache and fetch are disabled

* Address comments

* Add more test

* Increase timeout for test

* Add wrapObjectStream

* Move methods to Firehose from PrefetchableFirehoseFactory

* Cleanup comment

* add directory listing to s3 firehose

* Rename a variable

* Addressing comments

* Update document

* Support disabling prefetch

* Fix race condition

* Add fetchLock

* Remove ReplayableFirehoseFactoryTest

* Fix compilation error

* Fix test failure

* Address comments

* Add default implementation for new method
2017-05-18 15:37:18 +09:00
David Lim 8333043b7b add skipOffsetGaps flag (#4256) 2017-05-16 12:19:28 -06:00
Jihoon Son 50a4ec2b0b Add support for headers and skipping thereof for CSV and TSV (#4254)
* initial commit

* small fixes

* fix bug

* fix bug

* address code review

* more cr

* more cr

* more cr

* fix

* Skip head rows for CSV and TSV

* Move checking skipHeadRows to FileIteratingFirehose

* Remove checking null iterators

* Remove unused imports

* Address comments

* Fix compilation error

* Address comments

* Add more tests

* Add a comment to ReplayableFirehose

* Addressing comments

* Add docs and fix typos
2017-05-15 22:57:31 -07:00
hzy001 0c464f4a84 Fix docs (#4225)
* Fix one typo

Signed-off-by: Hao Ziyu <haoziyu@qiyi.com>

* Fix deprecated links

Signed-off-by: Hao Ziyu <haoziyu@qiyi.com>
2017-05-01 09:55:43 -07:00
Gian Merlino 631068b099 Fix broken DataSketches link. (#4221)
* Fix broken DataSketches link.

* Better fixed link.
2017-04-27 17:37:12 -07:00
satishbhor d51097c809 Fix lz4 library incompatibility in kafka-indexing-service extension (#4115)
* Fix lz4 library incompatibility in kafka-indexing-service extension #3266

* Bumped Kafka version to 0.10.2.0 for : Fix lz4 library incompatibility in kafka-indexing-service extension #3266

* Replaced Lists.newArrayList() with Collections.singletonList() For Fix lz4 library incompatibility in kafka-indexing-service extension #4115
2017-04-25 12:23:51 +09:00
Dongkyu Hwangbo 0d2e91ed50 Adding Kafka-emitter (#3860)
* Initial commit

* Apply another config: clustername

* Rename variable

* Fix bug

* Add retry logic

* Edit retry logic

* Upgrade kafka-clients version to the most recent release

* Make callback single object

* Write documentation

* Rewrite error message and emit logic

* Handling AlertEvent

* Override toString()

* make clusterName more optional

* bump up druid version

* add producer.config option which make user can apply another optional config value of kafka producer

* remove potential blocking in emit()

* using MemoryBoundLinkedBlockingQueue

* Fixing coding convention

* Remove logging every exception and just increment counting

* refactoring

* trivial modification

* logging when callback has exception

* Replace kafka-clients 0.10.1.1 with 0.10.2.0

* Resolve the problem related of classloader

* adopt try statement

* code reformatting

* make variables final

* rewrite toString
2017-04-04 14:07:43 -07:00
Fokko Driesprong add17fa7db Remove the metadataUpdateSpec from specfile (#3973)
Get rid of the metadataUpdateSpec section in the json example to
ingest parquet into druid. When this element is present, it will
fail start an indexing job.
2017-03-01 14:24:36 -08:00
Akash Dwivedi 94da5e80f9 Namespace optimization for hdfs data segments. (#3877)
* NN optimization for hdfs data segments.

* HdfsDataSegmentKiller, HdfsDataSegment finder changes to use new storage
format.Docs update.

* Common utility function in DataSegmentPusherUtil.

* new static method `makeSegmentOutputPathUptoVersionForHdfs` in JobHelper

* reuse getHdfsStorageDirUptoVersion in
DataSegmentPusherUtil.getHdfsStorageDir()

* Addressed comments.

* Review comments.

* HdfsDataSegmentKiller requested changes.

* extra newline

* Add maprfs.
2017-03-01 09:51:20 -08:00
michaelschiff e5fb0e1ff5 New property for each metric that tells the StatsDEmitter to convert metric values from range 0-1 to 0-100. This (#3936)
prevents rates and percentages expressed as Doubles (0.xx) from being rounded down to 0.
2017-02-16 13:55:56 -08:00
Himanshu 9dfcf0763a disable javascript execution by default (#3818) 2017-02-13 15:11:18 -08:00
Erik Dubbelboer 2aa2fa57b5 Simple doc fix (#3907) 2017-02-06 15:52:17 +05:30
Nishant Bangarwa a457cded28 Druid Extension to enable Authentication using Kerberos. (#3853)
* Add extension for supporting kerberos security

- This PR adds an extension for supporting druid authentication via
Kerberos.
- Working on the docs.

* Add docs

* review comments

* more review comments

* Block all paths by default

* more review comments - use proper Oid

* Allow extensions to override httpclient for integration tests

* Add kerberos lock to prevent multithreaded issues.

* review comment - remove enabled flag and fix router injection

* Add Cookie Handling and more detailed docs

* review comment - rename DruidKerberosConfig -> AuthKerberosConfig

* review comments

* fix travis failure on jdk7
2017-02-02 14:55:21 -06:00
kaijianding 33ae9dd485 streaming version of select query (#3307)
* streaming version of select query

* use columns instead of dimensions and metrics;prepare for valueVector;remove granularity

* respect query limit within historical

* use constant

* fix thread name corrupted bug when using jetty qtp thread rather than processing thread while working with SpecificSegmentQueryRunner

* add some test for scan query

* add scan query document

* fix merge conflicts

* add compactedList resultFormat, this format is better for json ser/der

* respect query timeout

* respect query limit on broker

* use static consts and remove unused code
2017-01-19 16:09:53 -06:00
Nishant f576a0ff14 Contrib Extension for Ambari Metrics Emitter (#3767)
* Contrib Extension for Ambari Metrics Emitter

extension to enable druid to send metrics to ambari metrics server
(https://cwiki.apache.org/confluence/display/AMBARI/Metrics)

review comments

switch to public repo

* review comments

* add docs

* fix pom version

* Add link for doc page in extensions.md

* remove unused imports

* review comments

review comments

remove unused dependency

review comment
2016-12-19 11:12:47 -08:00
David Lim 8eee259629 add documentation on segments generated (#3785) 2016-12-19 09:41:47 -08:00
Ninglin Du 469ab21091 [Feature] Thrift support for realtime and batch ingestion (#3418)
* Thrift ingestion plugin

1. thrift binary is platform dependent, use scrooge to generate java files to avoid style check failure
2. stream and hadoop ingesion are both supported, input format can be sequence file and lzo thrift block file.
3. base64 and protocol aware

change header

* fix conlicts in pom
2016-12-13 10:05:15 -08:00
Erik Dubbelboer 9f7050e221 Fix some grammar and spelling mistakes (#3717) 2016-11-28 11:49:30 -08:00
Himanshu 7d37f675ba fix the documented property name for specifying avro reader schema (#3708) 2016-11-22 15:02:41 -08:00
Parag Jain 7ee6bb7410 option to reset offest automatically in case of OffsetOutOfRangeException (#3678)
* option to reset offset automatically in case of OffsetOutOfRangeException
if the next offset is less than the earliest available offset for that partition

* review comments

* refactoring

* refactor

* review comments
2016-11-21 16:29:46 -06:00
Erik Dubbelboer 7d36f540e8 WIP: Add Google Storage support (#2458)
Also excludes the correct artifacts from #2741
2016-11-16 14:06:45 +05:30
Keuntae Park 094f5b851b Support Min/Max for Timestamp (#3299)
* Min/Max aggregator for Timestamp

* remove unused imports and method

* rebase and zip the test data

* add docs
2016-11-14 23:00:21 -08:00
Gian Merlino bcd20441be Make buildV9Directly the default. (#3688) 2016-11-14 09:29:32 -08:00
Mark 575aeb843a Metadata Storage extension for Microsoft SqlServer (sqlserver-metadata-storage) (#3421) 2016-11-08 14:56:52 -08:00
Nicolas Colomer 37ecffb648 Add support for Confluent Schema Registry in the avro extension (#3529) 2016-11-08 16:10:45 -06:00
cheddar c49a9d5693 Call out semver expectations for modules (#3659)
* Call out semver expectations for modules

* Update modules.md

* Link to versioning
2016-11-04 12:52:05 -07:00
Gian Merlino 4203580290 URIExtractionNamespace: Treat null values in lookup maps as missing entries. (#3512)
* URIExtractionNamespace: Treat null values in lookup maps as missing entries.

This is useful when many logical lookups are derived from the same base JSON file,
and some lookups' values may be unknown sometimes.

* Add test, logging message, and address other comments.

* Update docs.
2016-11-03 13:53:04 -07:00
David Lim 9226d4af3c configurable shutdownTimeout for Kakfa supervisor (#3497)
* configurable shutdownTimeout

* cr change
2016-09-23 13:26:45 -06:00
David Lim ca9114b41b add supervisor reset API (#3484)
* add supervisor reset API

* CR doc changes and kill running tasks / clear offsets from supervisor
2016-09-22 17:51:06 -07:00
Gian Merlino 27bd5cb13a Add forceExtendableShardSpecs option to Hadoop indexing, IndexTask. (#3473)
Fixes #3241.
2016-09-21 13:40:04 -06:00
David Lim 96fcca18ea update KafkaSupervisor to make HTTP requests to tasks in parallel where possible (#3452) 2016-09-20 22:51:15 +05:30
Slim 3175e17a3b Cached lookup module. first cut implementing JDBC cache (#2819) 2016-09-16 13:45:54 -07:00
Gian Merlino e0e28866ee JavaScript docs: Fix links and typos, add to TOC. (#3457) 2016-09-13 15:26:44 -07:00
Himanshu a069257d37 avro-extension -- feature to specify multiple avro reader schemas inline (#3368)
* rename SimpleAvroBytesDecoder to InlineSchemaAvroBytesDecoder

* feature to specify multiple schemas inline in avro module
2016-09-13 14:54:31 -07:00
Gian Merlino 76a24054e3 JavaScript docs, including docs for globals. (#3454) 2016-09-13 13:46:55 -07:00
Slim ba6ddf307e Adding hadoop kerberos authentification. (#3419)
* adding kerberos authentication

* make the 2 functions identical
2016-09-13 10:42:50 -07:00
David Lim 3a97fd4d6c doc fix (#3430) 2016-09-06 13:13:30 -06:00
Stéphane Derosiaux 48dce88aab Add flag binaryAsString for parquet ingestion (#3381) 2016-08-30 17:30:50 -07:00
Dave Li c4e8440c22 Adds long compression methods (#3148)
* add read

* update deprecated guava calls

* add write and vsizeserde

* add benchmark

* separate encoding and compression

* add header and reformat

* update doc

* address PR comment

* fix buffer order

* generate benchmark files

* separate encoding strategy and format

* fix benchmark

* modify supplier write to channel

* add float NONE handling

* address PR comment

* address PR comment 2
2016-08-30 16:17:46 -07:00
Fangjin Yang edb0eca3a9 fix docs (#3370) 2016-08-16 16:25:50 -07:00
Fangjin Yang 6beb8ac342 fix some docs and add new content (#3369) 2016-08-16 15:00:18 -07:00
Himanshu 46da682231 avro-extensions -- feature to specify avro reader schema inline in the task json for all events (#3249) 2016-08-10 10:49:26 -07:00
Jonathan Wei decefb7477 Add time interval dim filter and retention analysis example (#3315)
* Add time interval dim filter and retention analysis example

* Use closed-open matching for intervals, update cache key generation

* Fix time filtering tests for interval boundary change
2016-08-05 07:25:04 -07:00
Navis Ryu 5b3f0ccb1f Support variance and standard deviation (#2525)
* Support variance and standard deviation

* addressed comments
2016-08-04 17:32:58 -07:00
Fangjin Yang d51ec398d4 fix parquet docs (#3304) 2016-08-01 07:54:48 -07:00
Keuntae Park 95a58097e2 Hadoop InputRowParser for Orc file (#3019)
* InputRowParser to decode OrcStruct from OrcNewInputFormat

* add unit test for orc hadoop indexing

* update docs and fix test code bug

* doc updated

* resove maven dependency conflict

* remove unused imports

* fix returning array type from Object[] to correct primitive array type

* fix to support getDimension() of MapBasedRow : changing return type of orc list from array to list

* rebase and updated based on comments

* updated based on comments

* on reflecting review comments

* fix bug in typeStringFromParseSpec() and add unit test

* add license header
2016-07-26 09:42:56 -07:00
Gian Merlino ea03906fcf Configurable compressRunOnSerialization for Roaring bitmaps. (#3228)
Defaults to true, which is a change in behavior (this used to be false and unconfigurable).
2016-07-08 10:24:19 +05:30
Charles Allen 3f1681c16c Caffeine cache extension (#3028)
* Initial commit of caffeine cache

* Address code comments

* Move and fixup README.md a bit

* Improve caffeine readme information

* Cleanup caffeine pom

* Address review comments

* Bump caffeine to 2.3.1

* Bump druid version to 0.9.2-SNAPSHOT

* Make test not fail randomly.

See https://github.com/ben-manes/caffeine/pull/93#issuecomment-227617998 for an explanation

* Fix distribution and documentation

* Add caffeine to extensions.md

* Fix links in extensions.md

* Lexicographic
2016-07-06 15:42:54 -07:00
Charles Allen 8b7d9750ee Update extension docs for global lookup module (#3206) 2016-06-29 12:51:52 -07:00
David Lim b24425a280 update docs with new behavior (#3200) 2016-06-28 16:17:04 -07:00
Gian Merlino c12712e8b8 Move "libraries.md" out of docs, onto the main site. (#3159) 2016-06-16 18:14:35 -07:00
michaelschiff 7294ea87c3 link to statsd metrics emitter docs from development/extensions.html doc page (#3125) 2016-06-10 16:27:16 -07:00
Gian Merlino 99ee3f4dc3 Fixups, clarifications to lookup docs. (#3060) 2016-06-07 10:43:35 -07:00
Charles Allen fa41a6466a Cleanup the base lookup cluster wide config docs (#3061)
* Cleanup the base lookup cluster wide config docs

* Add better examples in lookups-cached-global.md

* Use actual valid stock lookups

* Fixed maps with :

* Add mix of lookups

* Better examples in extension

* Remove unneeded namespace requirement

* Add extra line space

* Add link to lookup tiers

* Renamed header
2016-06-07 10:42:41 -07:00
Charles Allen 8cac710546 Async lookups-cached-global by default (#3074)
* Async lookups-cached-global by default
* Also better lookup docs

* Fix test timeouts

* Fix timing of deserialized test

* Fix problem with 0 wait failing immediately
2016-06-03 15:58:10 -05:00
David Lim a2290a8f05 support seamless config changes (#3051) 2016-06-03 13:50:19 -07:00
Erik Dubbelboer b4737336e5 Added info about Google Cloud Storage (#3056) 2016-06-02 10:06:07 -07:00
David Lim f6c39cc844 Kafka task minimum message time (#3035)
* add KafkaIndexTask support for minimumMessageTime

* add Kafka supervisor support for lateMessageRejectionPeriod
2016-05-31 11:37:00 -07:00