Commit Graph

1771 Commits

Author SHA1 Message Date
Jonathan Wei 4caa61d8fa Fix tutorial sample data filename, fix logger classname in metrics docs (#6299) 2018-09-04 21:47:12 -07:00
Eyal Yurman 10ca290d64 Correct file name typo in Quickstart tutorial (#6297)
Correct name wikipedia-2015-09-12-sampled.json.gz to wikiticker-2015-09-12-sampled.json.gz
2018-09-04 14:20:17 -07:00
Jonathan Wei 180e3ccfad
Docs consistency cleanup (#6259) 2018-09-04 12:54:41 -07:00
QiuMM 9b04846e6b correct metric name in doc file (#6271) 2018-08-30 10:57:35 -07:00
Gian Merlino 431d3d8497
Rename io.druid to org.apache.druid. (#6266)
* Rename io.druid to org.apache.druid.

* Fix META-INF files and remove some benchmark results.

* MonitorsConfig update for metrics package migration.

* Reorder some dimensions in inner queries for some reason.

* Fix protobuf tests.
2018-08-30 09:56:26 -07:00
Himanshu 1fae6513e1 add "subtotalsSpec" attribute to groupBy query (#5280)
* add subtotalsSpec attribute to groupBy query

* dont sent subtotalsSpec to downstream nodes from broker and other updates

* address review comment

* fix checkstyle issues after merge to master

* add docs for subtotalsSpec feature

* address doc review comments
2018-08-28 17:46:38 -07:00
Jim Slattery d957295b98 spelling: storage (#6248) 2018-08-27 16:35:31 -07:00
Gian Merlino 0172326c62 SQL: Support more result formats, add columns header. (#6191)
* SQL: Support more result formats, add columns header.

- Add result formats for line-based JSON and CSV.
- Add X-Druid-Sql-Columns header with a list of all columns that
the response will contain.
- Add more comprehensive documentation on what callers should expect
when making Druid SQL queries.

* Fix some tests.

* Adjust tests.

* Adjust trailer, add types header.

* Fix trailers.
2018-08-26 23:00:14 -06:00
Susie 6e73ad6231 Fix bound query keys for Filtering on numeric values (#5881)
It is currently showing the use of `lowerBound` and `upperBound` instead of `lower` and `upper` for the range.
2018-08-23 14:07:10 -07:00
QiuMM ceb8f8e625 remove unnecessary tlsPortFinder to avoid potential port conflicts (#6194) 2018-08-23 10:41:49 -07:00
Ryan Plessner 9c500fb69f Add PostgreSQLConnectorConfig to expose SSL configuration options (#6181)
* Add PostgreSQLConnectorConfig to expose SSL configuration options for the Postgres Metadata Storage module.

* Fix checkstyle violations and add license header

* Convert properties in the postgres docs to be the full property path and fix typo

* Fix grammar in sslFactory docs
2018-08-21 16:45:27 -07:00
QiuMM 266f3dfbcb remove duplicate link to operations/recommendations.html (#6193) 2018-08-21 12:02:43 -07:00
QiuMM b0cf8d0252 'shutdownAllTasks' API for a dataSource (#6185)
* 'shutdownAllTasks' API for a dataSource

Change-Id: I30d14390457d39e0427d23a48f4f224223dc5777

* fix api path and return

Change-Id: Ib463f31ee2c4cb168cf2697f149be845b57c42e5

* optimize implementation

Change-Id: I50a8dcd44dd9d36c9ecbfa78e103eb9bff32eab9
2018-08-17 12:57:09 -04:00
Jonathan Wei 0c3bb47558 Change hybrid cache default types in docs to caffeine (#6182) 2018-08-17 12:17:43 -04:00
Caroline1000 f447b784de update sigar link (#6175) 2018-08-14 16:58:29 -07:00
QiuMM 69f555019b convert all time-intervals in ISO 8601 format to uppercase in doc files (#6118)
Change-Id: I904fed4cfb600a8a42664335557f611133a5078d
2018-08-13 12:58:47 -07:00
Jonathan Wei 94a937b5e8
New doc fixes (#6156) 2018-08-13 11:11:32 -07:00
Atul Mohan 064c22c937 Fix redirects (#6151) 2018-08-10 13:55:47 -07:00
Jonathan Wei b0805540af
Fix kafka tutorial typo (#6141) 2018-08-09 18:41:05 -07:00
Jonathan Wei af0557c1f7
Unified configuration doc page (#6127)
* Unified configuration doc page

* Rename to index.md, update redirects

* PR comments

* PR comments

* PR comment
2018-08-09 14:52:14 -07:00
Jonathan Wei fea2ab7094
New docs intro (#6122)
* New docs intro

* PR comments

* Fix arch diagram

* PR comment

* PR comment

* PR comment
2018-08-09 14:19:11 -07:00
pdeva c028d18d74 update redis-cache documentation (#6109)
* update redis-cache documentation

added clarifying info on setup and enablement

* added link
2018-08-09 13:44:59 -07:00
Jonathan Wei aa660b8751 Add docs for virtual columns and transform specs (#6119)
* Add docs for virtual columns and transform specs

* PR Comments

* PR comment
2018-08-09 14:42:52 -06:00
Jonathan Wei 2b64025eaf Separate hadoop and native batch docs more (#6120)
* Separate hadoop and native batch docs more

* Rebase with parallel batch

* PR comments
2018-08-09 14:40:20 -06:00
Jonathan Wei 24f2e8ba26 New quickstart and tutorials (#6126)
* New quickstart and tutorials

* PR comments

* Fix tranquility
2018-08-09 14:37:52 -06:00
Jonathan Wei 2b0f03acb9 Unified API doc page (#6128)
* Unified API doc page

* PR comments

* Fix metadata endpoint
2018-08-09 14:27:42 -06:00
Gian Merlino 3525d4059e
Cache: Add maxEntrySize config, make groupBy cacheable by default. (#5108)
* Cache: Add maxEntrySize config.

The idea is this makes it more feasible to cache query types that
can potentially generate large result sets, like groupBy and select,
without fear of writing too much to the cache per query.

Includes a refactor of cache population code in CachingQueryRunner and
CachingClusteredClient, such that they now use the same CachePopulator
interface with two implementations: one for foreground and one for
background.

The main reason for splitting the foreground / background impls is
that the foreground impl can have a more effective implementation of
maxEntrySize. It can stop retaining subvalues for the cache early.

* Add CachePopulatorStats.

* Fix whitespace.

* Fix docs.

* Fix various tests.

* Add tests.

* Fix tests.

* Better tests

* Remove conflict markers.

* Fix licenses.
2018-08-07 10:23:15 -07:00
Jihoon Son 56ab4363ea
Native parallel batch indexing without shuffle (#5492)
* Native parallel indexing without shuffle

* fix build

* fix ci

* fix ingestion without intervals

* fix retry

* fix retry

* add it test

* use chat handler

* fix build

* add docs

* fix ITUnionQueryTest

* fix failures

* disable metrics reporting

* working

* Fix split of static-s3 firehose

* Add endpoints to supervisor task and a unit test for endpoints

* increase timeout in test

* Added doc

* Address comments

* Fix overlapping locks

* address comments

* Fix static s3 firehose

* Fix test

* fix build

* fix test

* fix typo in docs

* add missing maxBytesInMemory to doc

* address comments

* fix race in test

* fix test

* Rename to ParallelIndexSupervisorTask

* fix teamcity

* address comments

* Fix license

* addressing comments

* addressing comments

* indexTaskClient-based segmentAllocator instead of CountingActionBasedSegmentAllocator

* Fix race in TaskMonitor and move HTTP endpoints to supervisorTask from runner

* Add more javadocs

* use StringUtils.nonStrictFormat for logging

* fix typo and remove unused class

* fix tests

* change package

* fix strict build

* tmp

* Fix overlord api according to the recent change in master

* Fix it test
2018-08-06 23:59:42 -07:00
Nishant Bangarwa 75c8a87ce1 Part 2 of changes for SQL Compatible Null Handling (#5958)
* Part 2 of changes for SQL Compatible Null Handling

* Review comments - break lines longer than 120 characters

* review comments

* review comments

* fix license

* fix test failure

* fix CalciteQueryTest failure

* Null Handling - Review comments

* review comments

* review comments

* fix checkstyle

* fix checkstyle

* remove unrelated change

* fix test failure

* fix failing test

* fix travis failures

* Make StringLast and StringFirst aggregators nullable and fix travis failures
2018-08-02 08:20:25 -07:00
Andrés Gómez e270362767 Add stringLast and stringFirst aggregators extension (#5789)
* Add lastString and firstString aggregators extension

* Remove duplicated class

* Move first-last-string doc page to extensions-contrib

* Fix ObjectStrategy compare method

* Fix doc bad aggregatos type name

* Create FoldingAggregatorFactory classes to fix SegmentMetadataQuery

* Add getMaxStringBytes() method to support JSON serialization

* Fix null pointer exception at segment creation phase when the string value is null

* Control the valueSelector object class on BufferAggregators

* Perform all improvements

* Add java doc on SerializablePairLongStringSerde

* Refactor ObjectStraty compare method

* Remove unused ;

* Add aggregateCombiner unit tests. Rename BufferAggregators unit tests

* Remove unused imports

* Add license header

* Add class name to java doc class serde

* Throw exception if value is unsupported class type

* Move first-last-string extension into druid core

* Update druid core docs

* Fix null pointer exception when pair->string is null

* Add null control unit tests

* Remove unused imports

* Add first/last string folding aggregator on AggregatorsModule to support segment metadata query

* Change SerializablePairLongString to extend SerializablePair

* Change vars from public to private

* Convert vars to primitive type

* Clarify compare comment

* Change IllegalStateException to ISE

* Remove TODO comments

* Control possible null pointer exception

* Add @Nullable annotation

* Remove empty line

* Remove unused parameter type

* Improve AggregatorCombiner javadocs

* Add filterNullValues option at StringLast and StringFirst aggregators

* Add filterNullValues option at agg documentation

* Fix checkstyle

* Update header license

* Fix StringFirstAggregatorFactory.VALUE_COMPARATOR

* Fix StringFirstAggregatorCombiner

* Fix if condition at StringFirstAggregateCombiner

* Remove filterNullValues from string first/last aggregators

* Add isReset flag in FirstAggregatorCombiner

* Change Arrays.asList to Collections.singletonList
2018-08-01 10:52:54 -07:00
Caroline1000 7f89c72932 Add definition of 'NONE' to queryGranularity in ingestion.index doc (#6073)
* Add meaning of granularity = None to queryGranularity

* Fix format
2018-07-30 14:07:33 -07:00
Gian Merlino 63be028cee
CompactionTask: Reject empty intervals on construction. (#6059)
* CompactionTask: Reject empty intervals on construction.

They don't make sense anyway, and it's better to fail fast.

* Switch API.
2018-07-30 08:52:50 -07:00
Eyal Yurman 94d6c9a0a5 Remove JDK 7 from build documentation. (#6031)
See issue #6030
2018-07-26 17:05:07 -07:00
Jonathan Wei efab3b0160 Add concat and textcat SQL functions (#6005) 2018-07-20 11:21:04 -07:00
Gian Merlino cd8ea3da8d
SQL: Add server-wide default time zone config. (#5993)
* SQL: Add server-wide default time zone config.

* Switch API.
2018-07-18 13:12:40 -07:00
Caroline1000 5f78a333ad show that flatten will also work with avro extension (#5874)
* show that flatten will also work with avro extension

* fix url
2018-07-11 16:47:03 -07:00
Gian Merlino 04ea3c9f8c
Update license headers. (#5976)
* Update license headers.

For compliance with http://www.apache.org/legal/src-headers.html.

* More license adjustments.

* Fix mistakenly edited package line.
2018-07-11 09:55:18 -07:00
Caroline1000 b3976050ad add definition of balancerComputeThreads (#5865) 2018-07-05 09:54:36 -07:00
Caroline1000 ee4a5aafb0 add config values for GCS deep storage (#5875)
* add config values for GCS deep storage

* fix config values for GCS deep storage
2018-07-05 09:53:41 -07:00
Dylan Wylie 10642ef9ca Fix filtered request logging docs (#5924)
- Setting druid.request.logging.delegate has no effect. 
- The provider is injected based on a type parameter & this looks to be scoped to delegate for filtered loggers
2018-07-05 09:51:10 -07:00
scrawfor bf2a31a5bc Add new 'true' filter which always returns true. (#5711)
* Add new 'true' filter which always returns true.

* Add support for bitmap index.

* Adds documentation.

* Removes No-op Filter
2018-06-28 11:52:45 -07:00
Gian Merlino a28314349c
Fix spelling of "propagate" in various places. (#5896)
One of these is a configuration parameter (introduced in #5429),
but it's never been in a release, so I think it's ok to rename it.
2018-06-25 09:18:08 -07:00
varaga b4b1b2a020 Provisioning support for ZooKeeper Authorization (#5701)
Review comments implemented
2018-06-15 14:02:01 -07:00
zhangxinyu e43e5ebbcd Materialized view implementation (#5556)
* implement materialized view

* modify code according to jihoonson's comments

* modify code according to jihoonson's comments - 2

* add documentation about materialized view

* use new HadoopTuningConfig in pr 5583

* add minDataLag and fix optimizer bug

* correct value of DEFAULT_MIN_DATA_LAG_MS

* modify code according to jihoonson's comments - 3

* use the boolean expression instead of if-else
2018-06-09 12:24:54 -07:00
Caroline1000 96feb479cd add order change needed for KIS in 0.12.0 (#5760) 2018-06-08 15:25:26 -07:00
Hongze Zhang cfa94b747b Update to jetty 9.4; Enable request decompression (#5624)
* Update to jetty 9.4; Enable request decompression; Add http compression config options

* Fix BadMessageException from jetty server at HttpGenerator.generateHeaders(...)
2018-06-08 14:53:08 -07:00
awelsh93 adbe22c05b Security - add anonymous authenticator (#5842)
* Anonymous authenticator that authenticates all requests and then directs them to an authorizer.

* Adding documentation

* Removed some fields from class AnonymousAuthenticator

* Updating docs
2018-06-07 10:17:54 -07:00
Siddharth Subramanian 37409dc2f4 Fix minor documentation error (#5851)
Adding a required `,` in the sample JSON
2018-06-06 12:51:56 -07:00
Ryan Plessner ee45ee6915 Fix docs to reflect the correct default max total row count for the IndexTuningConfig (#5845) 2018-06-05 13:15:12 -07:00
awelsh93 1a4707f09c Remove extra slash in endpoint (#5822) 2018-06-05 13:11:26 -07:00
Alexander Saydakov d1cdcd4895 Datasketches doc correction (#5816)
* func was renamed to operation during code review

* added missing descriptions, some cleanup
2018-06-05 17:52:37 +05:30
Atul Mohan 50ad7a45ff Fix authentication doc (#5813) 2018-05-30 11:10:48 -07:00
Jihoon Son 67ff7dacbd Support server-side encryption for s3 (#5740)
* Support server-side encryption for s3

* fix teamcity

* typo

* address comments

* Refactoring configuration injection

* fix doc

* fix doc
2018-05-28 20:22:08 -07:00
Joseph Glanville 5cbfb95e1f docs: Document inputFormat on Hadoop InputSpecs (#5784) 2018-05-24 21:44:37 -07:00
Gian Merlino bc0ff251a3 Docs: Clarify the meaning of maxSplitSize. (#5803) 2018-05-24 21:43:39 -07:00
Michael Schnupp 33b4eb624d fix freeSpacePercent in segmentCache.locations (#5765)
* fix freeSpacePercent in segmentCache.locations

* the check should probably test the other way around
* documentation should put the option in the right place
* examples have a superfluous backslash

* add test to verify correct behavior

* switch to Path and test with jimfs

Path allows to use different filesystems.
Jimfs provides an actual (in memory) filesystem.
This also allows more complex test scenarios.

The behavior should be unchanged by this commit.

* Revert "switch to Path and test with jimfs"

This reverts commit 8b9a418d65.
2018-05-24 11:15:30 +09:00
Atul Mohan 1b9611a60e Local indexing from RDBMS (#5441)
* Local indexing from RDBMS

*  Fix content

* Remove pom changes

* Remove extraneous space

* Add tests and update documentation

* Fix comments

* Fix docs

*  Fix build related issue

*  Handle invalid strings

* Make target database independent of metadata storage

* Add firehose connector

* Fix accessibility

* Add docs

* Remove unused def

* Remove lazy instantiation of jsoniterator

* Move unused changes

* Move unused changes

* Fix build

* Make Sqlfirehose method private
2018-05-22 12:33:01 +09:00
Caroline1000 c73e3ea4f5 Provide examples to havingSpec filters (#5774)
* expand examples

* expand examples for filtered havingSpecs

* expand other having examples

* remove blank code block

* add better AND/OR/NOT examples

* fix indentation
2018-05-14 13:43:42 -07:00
Abhishek Kaushik aa23fe6386 Typo fix in historical doc (#5753) 2018-05-08 11:08:27 -07:00
Kirill Kozlov 67d0b0ee42 Add taskType dimension to task metrics (#5664) 2018-05-07 09:42:26 -07:00
kaijianding c12c16385e support throw duplcate row during realtime ingestion in RealtimePlumber (#5693) 2018-05-04 10:12:25 -07:00
Dylan Wylie 2c5f0038fd Make lookup offheap buffer configurable (#5696)
* Make lookup offheap buffer configurable

Fixes #3663

* Address comments

* Update docs

* Update docs
2018-05-04 10:00:55 -07:00
Stuart McLean c2b5e5ec95 Default caffeine cache size (#5738)
* add default caffeine cache size based on runtime Xmx or max 1GB

* update docs for caffeine cache

* fix formatting

* test caffeine size should never be less than 0

* set caffeine max default size to 1G not 1M

* fix caffeine cache tests
2018-05-04 09:29:11 -07:00
Surekha 13c616ba24 'maxBytesInMemory' tuningConfig introduced for ingestion tasks (#5583)
* This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks

Currently a config called 'maxRowsInMemory' is present which affects how much memory gets
used for indexing.If this value is not optimal for your JVM heap size, it could lead
to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might
be bad for query performance and a higher value will limit number of persists but require
more jvm heap space and could lead to OOM.
'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes
kept in memory before persisting.

 * The default value is 1/3(Runtime.maxMemory())
 * To maintain the current behaviour set 'maxBytesInMemory' to -1
 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them
   will be respected i.e. the first one to go above threshold will trigger persist

* Fix check style and remove a comment

* Add overlord unsecured paths to coordinator when using combined service (#5579)

* Add overlord unsecured paths to coordinator when using combined service

* PR comment

* More error reporting and stats for ingestion tasks (#5418)

* Add more indexing task status and error reporting

* PR comments, add support in AppenderatorDriverRealtimeIndexTask

* Use TaskReport instead of metrics/context

* Fix tests

* Use TaskReport uploads

* Refactor fire department metrics retrieval

* Refactor input row serde in hadoop task

* Refactor hadoop task loader names

* Truncate error message in TaskStatus, add errorMsg to task report

* PR comments

* Allow getDomain to return disjointed intervals (#5570)

* Allow getDomain to return disjointed intervals

* Indentation issues

* Adding feature thetaSketchConstant to do some set operation in PostAgg (#5551)

* Adding feature thetaSketchConstant to do some set operation in PostAggregator

* Updated review comments for PR #5551 - Adding thetaSketchConstant

* Fixed CI build issue

* Updated review comments 2 for PR #5551 - Adding thetaSketchConstant

* Fix taskDuration docs for KafkaIndexingService (#5572)

* With incremental handoff the changed line is no longer true.

* Add doc for automatic pendingSegments (#5565)

* Add missing doc for automatic pendingSegments

* address comments

* Fix indexTask to respect forceExtendableShardSpecs (#5509)

* Fix indexTask to respect forceExtendableShardSpecs

* add comments

* Deprecate spark2 profile in pom.xml (#5581)

Deprecated due to https://github.com/druid-io/druid/pull/5382

* CompressionUtils: Add support for decompressing xz, bz2, zip. (#5586)

Also switch various firehoses to the new method.

Fixes #5585.

* This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks

Currently a config called 'maxRowsInMemory' is present which affects how much memory gets
used for indexing.If this value is not optimal for your JVM heap size, it could lead
to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might
be bad for query performance and a higher value will limit number of persists but require
more jvm heap space and could lead to OOM.
'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes
kept in memory before persisting.

 * The default value is 1/3(Runtime.maxMemory())
 * To maintain the current behaviour set 'maxBytesInMemory' to -1
 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them
   will be respected i.e. the first one to go above threshold will trigger persist

* Address code review comments

* Fix the coding style according to druid conventions
* Add more javadocs
* Rename some variables/methods
* Other minor issues

* Address more code review comments

* Some refactoring to put defaults in IndexTaskUtils
* Added check for maxBytesInMemory in AppenderatorImpl
* Decrement bytes in abandonSegment
* Test unit test for multiple sinks in single appenderator
* Fix some merge conflicts after rebase

* Fix some style checks

* Merge conflicts

* Fix failing tests

Add back check for 0 maxBytesInMemory in OnHeapIncrementalIndex

* Address PR comments

* Put defaults for maxRows and maxBytes in TuningConfig
* Change/add javadocs
* Refactoring and renaming some variables/methods

* Fix TeamCity inspection warnings

* Added maxBytesInMemory config to HadoopTuningConfig

* Updated the docs and examples

* Added maxBytesInMemory config in docs
* Removed references to maxRowsInMemory under tuningConfig in examples

* Set maxBytesInMemory to 0 until used

Set the maxBytesInMemory to 0 if user does not set it as part of tuningConfing
and set to part of max jvm memory when ingestion task starts

* Update toString in KafkaSupervisorTuningConfig

* Use correct maxBytesInMemory value in AppenderatorImpl

* Update DEFAULT_MAX_BYTES_IN_MEMORY to 1/6 max jvm memory

Experimenting with various defaults, 1/3 jvm memory causes OOM

* Update docs to correct maxBytesInMemory default value

* Minor to rename and add comment

* Add more details in docs

* Address new PR comments

* Address PR comments

* Fix spelling typo
2018-05-03 16:25:58 -07:00
Gian Merlino 739e347320 Allow Hadoop dataSource inputSpec to be specified multiple times. (#5717)
* Allow Hadoop dataSource inputSpec to be specified multiple times.

* Fix test
2018-05-03 13:51:57 -07:00
Stuart McLean d2b8d880ea include hybrid and caffeine in cache docs and show caffeine as default (#5737) 2018-05-03 09:52:05 -07:00
Jihoon Son d4311b4a5a Support enablePathStyleAccess, disableChunkedEncoding, and forceGlobalBucketAccessEnabled for aws client (#5702)
* Support enablePathStyleAccess and disableChunkedEncoding for aws client

* add an option for forceGlobalBucketAccessEnabled

* add missing doc
2018-05-02 10:45:38 -07:00
Jakub Kukul e2431ae161 Update defaultHadoopCoordinates in documentation. (#5720)
* Update defaultHadoopCoordinates in documentation.

To match changes applied in #5382.

* Remove a parameter with defaults from example configuration file.

If it has reasonable defaults, then why would it be in an example config file?

Also, it is yet another place that has been forgotten to be updated and will be forgotten in the future.

Also, if someone is running different hadoop version, then there's much more work to be done than just changing this property, so why give users false hopes?

* Fix typo in documentation.
2018-04-30 20:49:14 -07:00
Dylan Wylie 754c80e74a Fix quickstart docs to specify that Java 8 is required. (#5722)
See #4907 #5719
2018-04-30 13:25:59 -07:00
Gian Merlino 0f8493846e Replace dev list references in docs. (#5723) 2018-04-30 11:25:45 -07:00
David Lim 8ec2d2fe18 Use unique segment paths for Kafka indexing (#5692)
* support unique segment file paths

* forbiddenapis

* code review changes

* code review changes

* code review changes

* checkstyle fix
2018-04-29 21:59:48 -07:00
Gian Merlino 762f8829e4
Add task action metrics, add taskId metric dimension. (#5714)
* Add task action metrics, add taskId metric dimension.

Adds two new metrics: task/action/log/time and task/action/run/time. Also
adds taskId as a dimension, to give us the ability to drill down into metrics
for an individual task. Also standardizes metrics-attachment using two helper
methods in IndexTaskUtils.

* Fix typo
2018-04-29 21:24:06 -07:00
Joseph Glanville 90cd05696e Document processing properties required for Middlemanager (#5660) 2018-04-29 17:20:17 -07:00
Jihoon Son 86746f82d8 Use mergeBuffer instead of processingBuffer in parallelCombiner (#5634)
* Use mergeBuffer instead of processingBuffer in parallelCombiner

* Fix test

* address comments

* fix test

* Fix test

* Update comment

* address comments

* fix build

* Fix test failure
2018-04-27 18:14:37 -07:00
Gian Merlino f81855d607
Add unauthorized errorCode to query docs. (#5691) 2018-04-26 13:06:25 -07:00
Caroline1000 fd76af9737 remove old prod cluster config link (#5676) 2018-04-23 18:00:24 -07:00
scrawfor 15f4ab2b31 Expose noop filter to users (#5597) 2018-04-18 07:57:07 -07:00
Gian Merlino fbf3fc178e Timeseries: Add "grandTotal" option. (#5640)
* Timeseries: Add "grandTotal" option.

* Modify whitespace.

* Checkstyle workaround.
2018-04-16 18:22:19 -07:00
Jonathan Wei d0b66a6af5 Fix HTTP OPTIONS request auth handling (#5638)
* Fix HTTP OPTIONS request auth handling

* PR comment

* More PR comments

* Fix

* PR comment
2018-04-16 18:09:56 -07:00
Jihoon Son 6b3bde0143 Fix granularitySpec doc (#5647) 2018-04-16 14:24:39 -04:00
Jonathan Wei 882b172318
Revert "Fix HTTP OPTIONS request auth handling (#5615)" (#5637)
This reverts commit df51a7bcb7.
2018-04-12 16:43:54 -07:00
Jonathan Wei df51a7bcb7
Fix HTTP OPTIONS request auth handling (#5615)
* Fix HTTP OPTIONS request auth handling

* Flip configuration boolean
2018-04-12 14:02:20 -07:00
Caroline1000 48c1a1ef57 change header from Data Schema to Ingestion Spec (#5631) 2018-04-11 21:42:54 -07:00
Nishant Bangarwa e6efd75a3d Add config to allow setting up custom unsecured paths for druid nodes. (#5614)
* Add config to allow setting up custom unsecured paths for druid nodes.

* return all resources for Unsecured paths

* review comment - Add test

* fix tests

* fix test
2018-04-11 17:10:07 -07:00
Caroline1000 afa75e04b7 change header in overlord console; minor querydoc change (#5625)
* change header in overlord console; minor querydoc change

* remove change to overlord console

* address Gian comments
2018-04-11 12:57:22 -07:00
Nishant Bangarwa b32aad9ab4 Fix some broken links in druid docs (#5622)
* Fix some broken links in druid docs

* review comment
2018-04-11 10:27:33 -07:00
Nishant Bangarwa 80fa5094e8 Fix Kerberos Authentication failing requests without cookies and excludedPaths config. (#5596)
* Fix Kerberos Authentication failing requests without cookies.

KerberosAuthenticator was failing `First` request from the clients.
After authentication we were setting the cookie properly but not
setting the the authenticated flag in the request. This PR fixed that.

Additional Fixes -
* Removing of Unused SpnegoFilterConfig - replaced by
KerberosAuthenticator
* Unused internalClientKeytab and principal from KerberosAuthenticator
* Fix docs accordingly and add docs for configuring an escalated
client.

* Fix excluded path config behavior

* spelling correction

* Revert "spelling correction"

This reverts commit fb754b43d8.

* Revert "Fix excluded path config behavior"

This reverts commit 3901047769.
2018-04-09 20:45:35 -07:00
Alexander T ad6f234e1e Update lookups-cached-global.md (#5525)
Update lookup creation example to work with version 0.12.0
2018-04-06 16:13:17 -07:00
Jihoon Son 723857699c Add doc for automatic pendingSegments (#5565)
* Add missing doc for automatic pendingSegments

* address comments
2018-04-05 23:53:43 -07:00
Dylan Wylie ddd23a11e6 Fix taskDuration docs for KafkaIndexingService (#5572)
* With incremental handoff the changed line is no longer true.
2018-04-05 23:52:58 -07:00
Arup Malakar 0c4598c1fe Fix typo in avatica java client code documenation (#5553) 2018-03-29 16:36:40 -05:00
Dyana Rose db508cf3ca [docs] fix invalid example json (#5547)
https://github.com/druid-io/druid/issues/5546
2018-03-28 13:53:38 -07:00
Clint Wylie 50e0e7f97d Correct lookup documentation (#5537)
fixes #5536
2018-03-26 17:01:02 -07:00
Nathan Hartwell ea30c05355 Adding ParserSpec for Influx Line Protocol (#5440)
* Adding ParserSpec for Influx Line Protocol

* Addressing PR feedback

- Remove extraneous TODO
- Better handling of parse errors (e.g. invalid timestamp)
- Handle sub-millisecond timestamps

* Adding documentation for Influx parser

* Fixing docs
2018-03-26 14:28:46 -07:00
Atul Mohan ec17a44e09 Add result level caching to Brokers (#5028)
* Add result level caching to Brokers

* Minor doc changes

* Simplify sequences

*  Move etag execution

* Modify cacheLimit criteria

* Fix incorrect etag computation

* Fix docs

* Add separate query runner for result level caching

* Update docs

* Add post aggregated results to result level cache

* Fix indents

* Check byte size for exceeding cache limit

* Fix indents

* Fix indents

* Add flag for result caching

* Remove logs

* Make cache object generation synchronous

* Avoid saving intermediate cache results to list

* Fix changes that handle etag based response

* Release bytestream after use

*  Address PR comments

*  Discard resultcache stream after use

* Fix docs

* Address comments

* Add comment about fluent workflow issue
2018-03-23 19:11:52 -07:00
Charles Allen ef21ce5a64
Add graceful shutdown timeout for Jetty (#5429)
* Add graceful shutdown timeout

* Handle interruptedException

* Incorporate code review comments

* Address code review comments

* Poll for activeConnections to be zero

* Use statistics handler to get active requests

* Use native jetty shutdown gracefully

* Move log line back to where it was

* Add unannounce wait time

* Make the default retain prior behavior

* Update docs with new config defaults

* Make duration handling on jetty shutdown more consistent

* StatisticsHandler is a wrapper

* Move jetty lifecycle error logging to error
2018-03-23 09:38:17 -07:00
Gian Merlino 0851f2206c
Expanded documentation for DataSketches aggregators. (#5513)
Originally written by @AlexanderSaydakov in druid-io/druid-io.github.io#448.
I also added redirects and updated links to point to the new
datasketches-extension.html landing page for the extension, rather than to
the old page about theta sketches.
2018-03-21 18:19:27 -07:00
Jihoon Son 1ad898bde2
Use the official aws-sdk instead of jet3t (#5382)
* Use the official aws-sdk instead of jet3t

* fix compile and serde tests

* address comments and fix test

* add http version string

* remove redundant dependencies, fix potential NPE, and fix test

* resolve TODOs

* fix build

* downgrade jackson version to 2.6.7

* fix test

* resolve the last TODO

* support proxy and endpoint configurations

* fix build

* remove debugging log

* downgrade hadoop version to 2.8.3

* fix tests

* remove unused log

* fix it test

* revert KerberosAuthenticator change

* change hadoop-aws scope to provided in hdfs-storage

* address comments

* address comments
2018-03-21 15:36:54 -07:00
Slim 17c71a2a60
Make Doubles aggregators use 64bits by default (#5478)
* use 64-bit float representation for double based aggregator

Change-Id: Ia4f442037052add178f6ac68138c9d52f96c6e09

* review comments

Change-Id: I5a588f7364f236bf22f2b138e9d743bfb27c67fe
2018-03-19 19:13:04 -07:00
Christoph Hösler 34f655599d Let MySQLConnector accept all UTF charsets and recommend utf8mb4 (#5411)
* Let MySQLConnector accept all UTF charsets and recommend utf8mb4

* Fix regex and remove newline in log statement
2018-03-13 01:16:10 -07:00
Himanshu 8fae0edc95 allow arbitrary aggregators for reindexing with hadoop (#5294) 2018-03-07 17:13:56 -08:00
Hongze Zhang b084075279 Add http/https proxy options to PullDependencies.java (#5450) 2018-03-07 15:05:43 -08:00
Gian Merlino 7416d1d02d Add "joda" option to timeFormat extractionFn. (#5448) 2018-03-02 19:59:26 -08:00
Gian Merlino e4eaee3806
Support for disabling bitmap indexes. (#5402)
* Support for disabling bitmap indexes.

Can save space for columns where bitmap indexes are pointless (like
free-form text).

* Remove import.

* Fix CompactionTaskTest.

* Update for review comments.

* Review comments, tests.

* Fix test.
2018-02-28 19:19:56 -08:00
Alexander Korablev 6a3a5350b8 Make memcached protocol and locator configurable. (#5438)
* Make memcached protocol and locator configurable.

* Style fix.

* Style fix.

* Style fix.
2018-02-28 17:16:43 -08:00
Gian Merlino f3796bc81b SQL: Lower default JDBC frame size. (#5409)
The previous default of 100,000 was a bit excessive and could easily
lead to OOM errors on "select *" style queries.
2018-02-21 10:00:48 -08:00
Parag Jain fba13d8978 time based checkpointing for Kafka Indexing Service (#5255)
* time based checkpointing

* add test and fix issue

* fix comments

* fix formatting

* update docs
2018-02-15 20:57:02 -08:00
David Lim 20a3164180 Support for router forwarding requests to active coordinator/overlord (#5369)
* allow router to forward requests to coordinator and overlord

* fix forbidden API

* more forbidden api fixes

* code review changes
2018-02-15 14:38:58 -08:00
Javier Collado c45fe37611 Feature add coordinator servers endpoint documentation (#5392)
* Add new servers section to the coordinator endpoints documentation

* Remove trailing whitespace
2018-02-15 14:37:58 -08:00
Dan Suzuki 472ba14dfe Support Map type in ORC extension (#5363)
* Support map type in orc extension.
Added getMapObject in OrcHadoopInputRowParser
Updated parse tests to parse map-type field in OrcHadoopInputRowParserTest

* changed from for-loop to foreach

* added resolution of column names when map types are exploded to several
columns. updated the document as well -- orc.md.

* Update orc.md

change from review
2018-02-15 13:03:15 -08:00
Parag Jain b9b3be6965 fix segment info in Kafka indexing service docs (#5390)
* fix segment info in Kafka indexing service docs

* review updates
2018-02-15 09:57:30 -08:00
QiuMM aa7aee53ce Opentsdb emitter extension (#5380)
* opentsdb emitter extension

* doc for opentsdb emitter extension

* update opentsdb emitter doc

* add the ms unit to the constant name

* add a configurable event limit

* fix version to 0.13.0-SNAPSHOT

* using a thread to consume metric event

* rename method and parameter
2018-02-13 13:10:22 -08:00
Andrew 06f0067b6c Fix typo: change partitioningSpec to partitionsSpec in design/segments (#5376) 2018-02-12 11:03:40 -08:00
Stephanie Rivera 77bb2f9c9f Update post-aggregations.md (#5237)
* Update post-aggregations.md

I think this is more clear. I am not sure how multiplying by 100 is involved in averaging...

* Update post-aggregations.md

adding additional aggregator

* Update post-aggregations.md
2018-02-06 16:26:39 -08:00
Jihoon Son 2099b43e5f Add a new config object for compactConfig (#5264)
* add a new config object for compactConfig

* fix test

* address comments

* Update doc
2018-02-06 12:13:52 -08:00
Gian Merlino 9a62b02cb7 Extensions: Option to load classes from extension jars first. (#5321)
The behavior is configurable through druid.extensions.useExtensionClassloaderFirst.
It is useful when extensions want to load a dependency different from one provided
by Druid, for example a different version of geoip or protobuf.
2018-02-06 16:14:03 +05:30
Jihoon Son 0db696b7c9 Fix CompactionTask doc (#5351)
* Fix CompactionTask doc

* Update coordinator doc
2018-02-05 22:38:34 -08:00
Himanshu 222a13e401
Use httpRemote and not remoteHttp for using HTTP Tasks Mgmt (#5334) 2018-02-02 14:16:43 -06:00
Gian Merlino ed47a1e1a9
Lookups: Inherit "injective" from registered lookups, improve docs. (#5316)
Code changes:
- In the lookup-based extractionFns, inherit injective property from
  the lookup itself if not specified.

Doc changes:
- Add a "Query execution" section to the lookups doc explaining how
  injective lookups and their optimizations work.
- Remove scary warnings against using registeredLookup extractionFns.
  They are necessary and important since they work with filters and
  function cascades -- two things that the dimension specs do not do.
  They deserve to be first class citizens.
- Move the "registeredLookup" fn above the "lookup" fn. It's probably
  more commonly used, so the docs read better this way.
2018-02-01 18:30:19 -08:00
Jonathan Wei 2a892709e8 More memory limiting for HttpPostEmitter (#5300)
* More memory limiting for HttpPostEmitter

* Less aggressive large events test

* Fix tests

* Restrict batch queue size first, keep minimum of 2 queue items
2018-01-26 15:48:45 -08:00
Jonathan Wei f6749f1229 Allow separate truststore conf for HttpEmitter (#5298)
* Fix HttpEmitter TLS support, allow separate truststore conf

* PR comment, fix tests
2018-01-26 10:46:06 -06:00
Jonathan Wei 80419752b5 Add metamx emitter, http clients, and metrics packages to druid java-util (#5289)
* Add metamx java-util emitter, http clients, and metrics packages to druid java-util

* Remove metamx java-util from pom.xml files

* Checkstyle fixes

* Import fix

* TeamCity inspection fixes

* Use slf4j, move some version defs to master pom.xml

* Use parent jvm-attach-api and maven-surefire-plugin versions

* Add ] to log msg, suppress inspection
2018-01-24 22:10:36 +01:00
Fokko Driesprong cc32640642 Update the example of the dimensionsSpec (#5293)
The example was outdated with the dateSpec
2018-01-24 11:28:54 -08:00
Gian Merlino 53e3c7d1b2 SQL: Add additional unsupported features to the docs. (#5290) 2018-01-24 11:27:47 -08:00
Akash Dwivedi d6932c1621 java-util version update + Add UnusedConnectionTimeout config. (#5239)
* java-util version update + Add UnusedConnectionTimeout config.

* warn  if unusedConnectionTime >= readTimeout.

* Doc update + addressed comment.

* Use compareTo  to compare duration.

* remove unused variable.

* addressed comments and default for unusedConnectionTimeout.
2018-01-17 15:54:18 -06:00
Jihoon Son 241efafbb2
Automatic compaction by coordinators (#5102)
* Automatic compaction by coordinator

* add links

* skip compaction for very recent segments if they are small

* fix finding search interval

* fix finding search interval

* fix TimelineHolder iteration

* add test for newestSegmentFirstPolicy

* add CompactionSegmentIterator

* add numTargetCompactionSegments

* add missing config

* fix skipping huge shards

* fix handling large number of segments per shard

* fix test failure

* change recursive call to loop

* fix logging

* fix build

* fix test failure

* address comments

* change dataSources type

* check running pendingTasks at each run

* fix test

* address comments

* fix build

* fix test

* address comments

* address comments

* add doc for segment size optimization

* address comment
2018-01-13 13:52:37 +09:00
Shen Liu 3c69717202 Fix typo in configuration/index.md (#5249) (#5250)
* Fix #5212 - typo in auth.md.

* Fix typo in configuration (#5249)

* Add a backquote.

* Fix typo from HttpEmitterMonitor to HttpEmittingMonitor.
2018-01-11 18:29:12 +09:00
Atul Mohan 3cc4a0ab19 Support for encryption of MySQL connections (#5122)
* Encrypting MySQL connections

* Update docs

* Make verifyServerCertificate a configurable parameter

* Change password parameter and doc update

* Make server certificate verification disabled by default

* Update tostring

* Update docs

* Add check for trust store passwords

* Add warning for null password
2018-01-10 11:33:54 -08:00
Jihoon Son 972b4d189a Fix topN doc (#5240) 2018-01-09 20:10:13 -08:00
Jonathan Wei 02544f9197 Add missing auth doc links (#5224) 2018-01-05 16:23:13 -06:00
Himanshu a46d34daa2 HTTP based task/worker management. (#5104)
* just renaming of SegmentChangeRequestHistory etc

* additional change history refactoring changes

* WorkerTaskManager a replica of WorkerTaskMonitor

* HttpServerInventoryView refactoring to extract sync code and robustification

* Introducing HttpRemoteTaskRunner

* Additional Worker side updates
2018-01-04 19:19:35 -08:00
Jonathan Wei 935ac646f4
Upgrade to Calcite 1.15.0 (#5210)
* Upgrade to Calcite 1.15.0

* Use Filtration.eternity()
2018-01-04 12:11:24 -08:00
Shen Liu 5a8ea5f8ab Fix #5212 - typo in auth.md. (#5213) 2018-01-04 12:09:42 -08:00
Nishant Bangarwa 4cc31e4e7a Update Zookeeper version (#5184) 2018-01-04 10:59:20 +08:00
Yuya Fujiwara 3d3b04e1b8 docs: fix broken link to ingestions and tasks on the Druid Concepts page (#5197)
* fix broken links

* add newline
2017-12-27 07:55:24 -08:00
Himanshu 0f5c7d1aec Add freeSpacePercent config in segment location to enforce free space while storing segments (#5137)
* Add freeSpacePercent config in segment location config to enforce free space while storing segments

* address review comments

* address review comments: more doc on freeSpacePercent and use Double for freeSpacePercent
2017-12-21 15:31:09 +03:00
Nishant Bangarwa 494e0b79ed Allow configuring header size for druid requests (#5174)
* Allow configuring header size for druid requests

* fix configuration name in doc.

* add more info to docs.

* Add info to kerberos doc.
2017-12-20 18:51:40 -08:00
Jonathan Wei f48c9d7be1
Basic auth extension (#5099)
* Basic auth extension

* Add auth configuration integration test

* Fix missing authorizerName property

* PR comments

* Fix missing @JsonProperty annotation

* PR comments

* more PR comments
2017-12-14 10:36:04 -08:00
Roman Leventov a7a6a0487e Replace IOPeon with SegmentWriteOutMedium; Improve buffer compression (#4762)
* Replace IOPeon with OutputMedium; Improve compression

* Fix test

* Cleanup CompressionStrategy

* Javadocs

* Add OutputBytesTest

* Address comments

* Random access in OutputBytes and GenericIndexedWriter

* Fix bugs

* Fixes

* Test OutputBytes.readFully()

* Address comments

* Rename OutputMedium to SegmentWriteOutMedium and OutputBytes to WriteOutBytes

* Add comments to ByteBufferInputStream

* Remove unused declarations
2017-12-04 18:04:27 -08:00
Slim 790678f02c Fix typo in docs (#5074)
Small fix for the realtime pull docs fix issue #5072
2017-11-22 23:16:36 -03:00
Chuanlei Ni 368d03146b assign granularity.all to SelectQuery by default (#5091) 2017-11-21 17:10:19 -08:00
Daniel 22c49b0d33 docs: fix broken link to broker configuration (#5105) 2017-11-21 13:32:00 +09:00
Roman Leventov dbb37b727d Add useL2 and populateL2 configs to HybridCache (#5088)
* Add useL2 and populateL2 configs to HybridCache

* typo
2017-11-20 16:57:05 -06:00
chaoqiang 50140ce820 StatsD Emitter Doc on blankHolder (#5101)
* fix equalDistribution worker select strategy

* replace anonymous Comparator

* keep previous version sorting comment

* fix code style

* update comment

* move JsonProperty

* fix statsD emitter with blank character

* Add blankHolder doc On statsD monitor
2017-11-18 12:00:47 -08:00
Parag Jain cb03efeb14 Kafka Index Task that supports Incremental handoffs (#4815)
* Kafka Index Task that supports Incremental handoffs
- Incrementally handoff segments when they hit maxRowsPerSegment limit
- Decouple segment partitioning from Kafka partitioning, all records from consumed partitions go to a single druid segment
- Support for restoring task on middle manager restarts by check pointing end offsets for segments

* take care of review comments

* make getCurrentOffsets call async, keep track of publishing sequence, review comments

* fix setEndoffset duplicate request handling, formatting

* fix unit test

* backward compatibility

* make AppenderatorDriverMetadata backwards compatible

* add unit test

* fix deadlock between persist and push executors in AppenderatorImpl

* fix formatting

* use persist dir instead of work dir

* review comments

* fix deadlock

* actually fix deadlock
2017-11-17 16:05:20 -06:00
Gian Merlino 486159ba8c SQL: Add TIMESTAMPADD. (#5079) 2017-11-16 12:00:34 -08:00
Gian Merlino 4fd4444b42 SQL: Add "array" result format, and document result formats. (#5032)
* SQL: Add "array" result format, and document result formats.

* Code style.
2017-11-13 20:24:06 -08:00
Jonathan Wei 9ac150c23a
Split internal client escalation from Authenticator interface (#5073)
* Split internal client escalation from Authenticator interface

* PR comments
2017-11-13 19:29:08 -08:00
Akash Dwivedi c1538f29fc maxQueryTimeout property in runtime properties. (#4852)
* maxQueryTimeout property in runtime properties.

* extra line

* move withTimeoutAndMaxScatterGatherBytes method to QueryLifeCycle.

* Fix initialize method.

* remove unused import.

* doc update.

* some more details in doc about query failure..

* minor fix.

* decorating QueryRunner to set and verify context. Added by servers.

* remove whitespace.
2017-11-13 19:23:11 -06:00
Jonathan Wei 819700cbc5 Automatically insert authenticator/authorizer names into config properties (#5071) 2017-11-13 13:12:31 -08:00
Gian Merlino 9444da5038 SQL: Improved behavior when implicitly casting strings to date/time literals. (#5023)
* SQL: Improved behavior when implicitly casting strings to date/time literals.

- Handle all flavors of ISO8601 and SQL literals.
- Throw errors on other literals instead of silently transforming them to 0.

* Respect timeZone when format is null.
2017-11-10 17:43:22 +09:00
Himanshu bbb678efd7 fix lookups endpoint collisions (#5058)
* fix lookups endpoint collissions

* fix errors
2017-11-09 17:39:53 -08:00
Himanshu 2ecebb3173 Fix coordinator/overlord redirects when TLS is enabled (#5037)
* Fix coordinator/overlord redirects when TLS is enabled

* address review comment

* fix UTs

* workaround to not ignore URL instance to fix the teamcity build

* update tls doc
2017-11-09 13:10:28 -08:00
Roman Leventov a8dc056c09
Add retries for coordinator fetch and lookup start in LookupReferencesManager (#5029)
* Add retries for coordinator fetch and lookup start in LookupReferencesManager

* Fix LookupConfigTest

* Address comments

* Address more comments

* And address more comments

* Address comms

* Recognize 'not found' lookups in LookupReferencesManager.tryGetLookupListFromCoordinator(), by @egor-ryashin
2017-11-09 02:30:36 -03:00
Gian Merlino e6ec4310b1 IT: Switch to OpenJDK8 base image. (#5060)
* IT: Switch to OpenJDK8 base image.

Also split the Docker image into a base image and a child image, and
build the base image ahead of time for efficiency's sake. Also upgrade
ZK to 3.4.10.

* Additional comments about ZK upgrades.
2017-11-08 19:56:31 -08:00
Jihoon Son 5f3c863d5e Add compaction task (#4985)
* Add compaction task

* added doc

* use combining aggregators

* address comments

* add support for dimensionsSpec

* fix getUniqueDims and getUniqueMetics

* find unique dimensionsSpec

* fix compilation

* add unit test

* fix test

* fix test

* test for different dimension orderings and types, and doc for type and ordering

* add control for custom ordering and type

* update doc

* fix compile

* fix compile

* add segments param

* fix serde error

* fix build
2017-11-03 21:55:27 -06:00
Roman Leventov 5eb08c27cb Add Emitter monitoring (#4973)
* Add Emitter monitoring

* Fix typo

* Fixes

* testing new emitter

* Fix failed test (#71)

* testing new emitter

* fix on failed test

* Remove emitter's readTimeout from docs

* Update docs

* Add HttpEmittingMonitor

* Update java-util to 1.3.2
2017-11-03 21:27:57 -06:00
Jiaqi Liu 7c8b14f18c Fix doc link (#5040) 2017-11-03 11:04:33 -07:00
Jonathan Wei 6840eabd87
Add Router connection balancers for Avatica queries (#4983)
* Add Router connection balancers for Avatica queries

* PR comments

* Adjust test bounds

* PR comments

* Add doc comments

* PR comments

* PR comment

* Checkstyle fix
2017-11-01 14:01:13 -07:00
Himanshu 654cdc07f5 Document HTTP based segment management and Deprecate classes to remove in future (#4997)
* document http segment management

* deprecated classes that shouldn't be used any further
2017-11-01 12:59:27 -04:00
Gian Merlino 0ce406bdf1
Introduce "transformSpec" at ingest-time. (#4890)
* Introduce "transformSpec" at ingest-time.

It accepts a "filter" (standard query filter object) and "transforms" (a
list of objects with "name" and "expression"). These can be used to do
filtering and single-row transforms without need for a separate data
processing job.

The "expression" fields use the same expression language as other
expression-based feature.

* Remove forbidden api.

* Fix compile error.

* Fix tests.

* Some more changes.

- Add nullable annotation to Firehose.nextRow.
- Add tests for index task, realtime task, kafka task, hadoop mapper,
  and ingestSegment firehose.

* Fix bad merge.

* Adjust imports.

* Adjust whitespace.

* Make Transform into an interface.

* Add missing annotation.

* Switch logger.

* Switch logger.

* Adjust test.

* Adjustment to handling for DatasourceIngestionSpec.

* Fix test.

* CR comments.

* Remove unused method.

* Add javadocs.

* More javadocs, and always decorate.

* Fix bug in TransformingStringInputRowParser.

* Fix bad merge.

* Fix ISFF tests.

* Fix DORC test.
2017-10-30 17:38:52 -07:00
elloooooo 52a162e302 define earlyMessegeRejectPeriod as the period after the taskduration (#4990) 2017-10-27 01:13:46 +05:30
Himanshu ef4a8cb724 Optional segment load/drop management without zookeeper using http (#4966)
* introducing CuratorLoadQueuePeon

* HttpLoadQueuePeon based off of current code

* Revert "Remove SegmentLoaderConfig.numLoadingThreads config (#4829)"

This reverts commit d8b3bfa63c.

* SegmentLoadDropHandler copy/pasted from ZkCoordinator

* Revert "1-based counts in ZkCoordinator (#4917)"

This reverts commit e725ff4146.

* remove non-zk part from ZkCoordinator

* remove zk part from SegmentLoadDropHandler

* additional changes for segment load/drop management with http

* address review comments

* add some more logs

* Execs class is moved
2017-10-19 12:41:23 -07:00
Darío ce7bf3f325 Update batch-ingestion.md (#4963)
I've had problems ingesting several S3 files with Druid. After checking I saw this: https://groups.google.com/forum/#!msg/druid-user/4L62vjor4NM/p8Z_R3lEAQAJ and realised that the docs hasn't been updated. This issue might have been solved with new Druid versions, but for those who are still using older ones (0.9.2), it's nice having this change made :)
2017-10-18 16:44:09 -07:00
Gian Merlino d5e83f9d50 Fix docs for MOD. (#4971) 2017-10-18 16:43:28 -07:00
Jihoon Son 52d7f74226 Add streaming aggregation as the last step of ConcurrentGrouper if data are spilled (#4704)
* Add steaming grouper

* Fix doc

* Use a single dictionary while combining

* Revert GroupByBenchmark

* Removed unused code

* More cleanup

* Remove unused config

* Fix some typos and bugs

* Refactor Groupers.mergeIterators()

* Add comments for combining tree

* Refactor buildCombineTree

* Refactor iterator

* Add ParallelCombiner

* Add ParallelCombinerTest

* Handle InterruptedException

* use AbstractPrioritizedCallable

* Address comments

* [maven-release-plugin] prepare release druid-0.11.0-sg

* [maven-release-plugin] prepare for next development iteration

* Address comments

* Revert "[maven-release-plugin] prepare for next development iteration"

This reverts commit 5c6b31e488.

* Revert "[maven-release-plugin] prepare release druid-0.11.0-sg"

This reverts commit 0f5c3a8b82.

* Fix build failure

* Change list to array

* rename sortableIds

* Address comments

* change to foreach loop

* Fix comment

* Revert keyEquals()

* Remove loop

* Address comments

* Fix build fail

* Address comments

* Remove unused imports

* Fix method name

* Split intermediate and leaf combine degrees

* Add comments to StreamingMergeSortedGrouper

* Add more comments and fix overflow

* Address comments

* ConcurrentGrouperTest cleanup

* add thread number configuration for parallel combining

* improve doc

* address comments

* fix build
2017-10-17 23:24:08 -07:00
Slim af2bc5f814 Make float default representation for DoubleSum/Min/Max aggregators (#4944)
* Introduce System wide property to select how to store double.
Set the default to store as float

Change-Id: Id85cca04ed0e7ecbce78624168c586dcc2adafaa

* fix tests

Change-Id: Ib42db724b8a8f032d204b58c366caaeabdd0d939

* Change the property name

Change-Id: I3ed69f79fc56e3735bc8f3a097f52a9f932b4734

* add tests and make default distribution store doubles as 64bits

Change-Id: I237b07829117ac61e247a6124423b03992f550f2

* adding mvn argument to parallel-test profile

Change-Id: Iae5d1328f901c4876b133894fa37e0d9a4162b05

* move property name and helper function to io.druid.segment.column.Column

Change-Id: I62ea903d332515de2b7ca45c02587a1b015cb065

* fix docs and clean style

Change-Id: I726abb8f52d25dc9dc62ad98814c5feda5e4d065

* fix docs

Change-Id: If10f4cf1e51a58285a301af4107ea17fe5e09b6d
2017-10-16 17:17:22 -07:00
Gian Merlino f51f346e36 SQL: Fix POWER doc, add test. (#4953) 2017-10-13 14:38:15 -07:00
Gian Merlino 5cfc7f9ef7 Fix formatting of SQL TRIM docs. (#4951) 2017-10-13 14:38:06 -07:00
Atul Mohan c07678b143 Synchronization of lookups during startup of druid processes (#4758)
* Changes for lookup synchronization

* Refactor of Lookup classes

* Minor refactors and doc update

* Change coordinator instance to be retrieved by DruidLeaderClient

* Wait before thread shutdown

* Make disablelookups flag true by default

* Update docs

* Rename flag

* Move executorservice shutdown to finally block

* Update LookupConfig

* Refactoring and doc changes

* Remove lookup config constructor

* Revert Lookupconfig constructor changes

* Add tests to LookupConfig

* Make executorservice local

* Update LRM

* Move ListeningScheduledExecutorService to ExecutorCompletionService

* Move exception to outer block

* Remove check to see future is done

* Remove unnecessary assignment

* Add logging
2017-10-12 21:22:24 -05:00
Jihoon Son dfa9cdc982 Prioritized locking (#4550)
* Implementation of prioritized locking

* Fix build failure

* Fix tc fail

* Fix typos

* Fix IndexTaskTest

* Addressed comments

* Fix test

* Fix spacing

* Fix build error

* Fix build error

* Add lock status

* Cleanup suspicious method

* Add nullables

*  add doInCriticalSection to TaskLockBox and revert return type of task actions

* fix build

* refactor CriticalAction

* make replaceLock transactional

* fix formatting

* fix javadoc

* fix build
2017-10-11 23:16:31 -07:00
Roman Leventov 7a9940d624 Add /readiness to HistoricalResource (#4916)
* Add /loadStatusCode to HistoricalResource

* Address comments

* Fixes
2017-10-11 20:35:52 -07:00
Gian Merlino b20e3038b6 SQL: Upgrade to Calcite 1.14.0, some refactoring of internals. (#4889)
* SQL: Upgrade to Calcite 1.14.0, some refactoring of internals.

This brings benefits:
- Ability to do GROUP BY and ORDER BY with ordinals.
- Ability to support IN filters beyond 19 elements (fixes #4203).

Some refactoring of druid-sql internals:
- Builtin aggregators and operators are implemented as SqlAggregators
  and SqlOperatorConversions rather being special cases. This simplifies
  the Expressions and GroupByRules code, which were becoming complex.
- SqlAggregator implementations are no longer responsible for filtering.

Added new functions:
- Expressions: strpos.
- SQL: TRUNCATE, TRUNC, LENGTH, CHAR_LENGTH, STRLEN, STRPOS, SUBSTR,
  and DATE_TRUNC.

* Add missing @Override annotation.

* Adjustments for forbidden APIs.

* Adjustments for forbidden APIs.

* Disable GROUP BY alias.

* Doc reword.
2017-10-10 12:44:05 -07:00
Gian Merlino 4e1d0f49d8 Docs: Fix link to broker configuration. (#4934) 2017-10-10 11:18:46 -07:00
chunghochen 0614b92df1 adding new post aggregators for test statistics to druid-stats extension (#4532)
* adding new post aggregators of test stats to druid-stats extension

* changes to address code review comments

* fix checkstyle violations using druid_intellij_formatting.xml after merge upstream/master

* add @Override annotation per CI log

* make changes per review comments/discussions

* remove some blocks per review comments
2017-10-09 23:43:27 -07:00
Parag Jain 7cc18226cd add more tls configs to enable/disable specific cipher suites and protocols (#4902)
* add more tls configs to enable/disable specific cipher suites and protocols

* fix doc, allow empty list
2017-10-09 13:53:12 -07:00
Himanshu 0e856ee806 add configs to enable fast request failure on broker and historical (#4540)
* add configs to enable fast request failure on broker

* address review comments

* fix styling error

* fix style error

* have enableRequestLimit config instead of having user specify max limit

* add comment

* fix style error

* add UT fo LimitRequestsFilter

* address review comments

* fix test

* make LimitRequestsFilterTest more robust

* fix JettyQosTest
2017-10-06 14:45:13 -05:00
Himanshu f69c9280c4 remove ServerConfig from DruidNode as all information needs to be present in DruidNode serialized form (#4858)
* remove ServerConfig from DruidNode as all information needs to be present in DruidNode serialized form

* sanitize output of /druid/coordinator/v1/cluster endpoint
2017-09-28 10:40:59 -05:00
Gian Merlino bf8fd4c203 Add flattenSpec support to the Avro parser. (#4832)
* Add flattenSpec support to the Avro parser.

Also:

- Refactor the JSONPathParser a bit so it can share flattening code
  with Avro (see ObjectFlatteners).
- Remove the JSONParser. It was only used in two places: by
  UriNamespaceExtractor, and as a base for JSONToLowerParser. Migrated
  the former to JSONPathParser and made the latter a standalone.
- Move GenericRecordAsMap to the Parquet extension, since the Avro
  extension no longer uses it.

* Fix indentation.

* Fix equals/hashCode.
2017-09-26 09:26:06 -07:00
Roman Leventov b56a907145 Add namespace extraction thread config (#4833) 2017-09-25 09:52:36 -07:00
Charles Allen a6470c1d03 Move caffeine out of extension and make it the default cache implementation. (#4810)
* Move caffeine out of extension.

* Remove `JsonTypeName` from the class itself

* Fix bad docs

* Fix distribution pom

* Fix unused import

* Make caffeine default

* Address code comments

* Add more description around the jre version in the readme

* Add suggested comments
2017-09-22 10:46:55 -07:00
Jonathan Wei 09fcb75583 Add RequestLogEvent emitters config to graphite-emitter (#4678)
* Add RequestLogEvent emitters config to graphite-emitter

* eagerly compute emitter list

* use lambdas

* checkstyle
2017-09-22 06:14:32 -07:00
Roman Leventov d8b3bfa63c Remove SegmentLoaderConfig.numLoadingThreads config (#4829) 2017-09-20 21:27:43 -07:00
Himanshu a36adc63e4 [documentation] add more jvm and os guidelines (#4793)
* add more jvm and os guidelines

* address review comments

* add not so general guidelines too

* duplicate statement removal
2017-09-20 13:12:57 -07:00
Jonathan Wei 164c73f2b2 Fix kerberos authenticator docs (#4822) 2017-09-19 14:32:22 -05:00
Jonathan Wei c2a0e753b6 Extension points for authentication/authorization (#4271)
* Extension points for authentication/authorization

* Address some PR comments

* Authorization result caching

* Add unit tests for SecuritySanityCheckFilter and PreResponseAuthorizationCheckFilter

* Use Set for auth caching, close outputstreams in filters

* Don't close output stream on success in sanity check filter

* Add ConfigResourceFilter to coordinator lookups

* Fix filtering authorization check for empty resource list

* HttpClient users must explicitly escalate the client

* Remove response modification from PreResponseAuthorizationCheckFilter

* Remove extraneous pom.xml

* Fix unit test

* Better lifecycle management

* Rename AuthorizationManager to Authorizer

* Fix authorization denials for empty supervisor list

* Address some PR comments

* Address more PR comments

* Small cleanup

* Add Jetty HttpClient wrapper to Authenticator

* Remove Authorizer start/stop

* Restore immutable context map in DruidConnection, UT fix

* Fix/update docs

* Add authorization checks to EventReceiverFirehose

* Fix router authorization check failure, restore PreResponseAuthorizationFilter changes

* Compile fixes

* Test fixes

* Update Authenticator/Authorizer doc comments

* Merge fixes

* PR comments

* Fix test

* Fix IT

* More PR comments

* PR comments

* SSL fix
2017-09-15 23:45:48 -07:00
Yuya Fujiwara 0fe734805b formatted table. (#4797) 2017-09-15 17:39:06 -07:00
Roman Leventov 267f415dc3 Update emitter library and add support for ParametrizedUriEmitter (#4722)
* Move emitters from io.druid.server.initialization to the dedicated io.druid.server.emitter package; Update emitter library to 0.6.0; Add support for ParametrizedUriEmitter; Support hierarical properties in JsonConfigurator (was needed for ParametrizedUriEmitter)

* Log created RequestLoggers

* Fix forbidden API

* Test fix

* More Http and Parametrized Http Emitter docs

* Switch to debug level
2017-09-13 17:17:19 -05:00
Gian Merlino 2ce8123bdb Move scan-query from a contrib extension into core. (#4751)
* Move scan-query from a contrib extension into core.

Based on a proposal at: https://groups.google.com/d/topic/druid-development/ME_OatUDnbk/discussion

This patch also adds support for virtual columns to the Scan query,
and updates Druid SQL to use Scan instead of Select.

This patch also makes some behavioral changes to handling of the __time
column. In particular, it is now is returned as "__time" rather than
"timestamp"; it is no longer included if you do not specifically ask for
it in your "columns"; and it is returned as a long rather than a string.

Users can revert time handling to the legacy extension behavior by
setting "legacy" : true in their queries, or setting the property
druid.query.scan.legacy = true. This is meant to provide a migration
path for users that were formerly using the contrib extension.

* Adjustments from review.

* Add back Select query.

* Adjust SQL docs.

* Restore SelectQuery link.
2017-09-13 09:51:24 -07:00
Kenji Noguchi c0be050242 Add jq expression support in flattenSpec (#4171)
* add jq expression in the flattenSpec

* more tests

* add benchmark

* fix style

* use JsonNode for both JSONPath and JQ

* clean up

* more clean up

* add documentation

* fix style

* move jackson-jq version to dependencyManagement section. remove commented code

* oops. revert wrong fix

* throw IllegalArgumentException for JQ syntax error

* remove e.printStackTrace() that is forbidden

* touch
2017-09-12 14:18:34 -05:00
Gian Merlino 4909c48b0c SQL: Full TRIM support. (#4750)
* SQL: Full TRIM support.

- Support trimming arbitrary characters
- Support BOTH, LEADING, and TRAILING

* Remove unused import.

* Fix tests, add RTRIM / LTRIM.

* Remove unused imports.

* BTRIM and docs.

* Replace for with foreach.
2017-09-12 11:49:08 -07:00
Parag Jain b5e839b3db injectable sslcontextfactory for jetty server and key manager factory algorithm (#4769)
* injectable sslcontextfactory for jetty server

key manager factory algorithm

* explicitly set trustAll certificates to false in sslcontextfactory
2017-09-12 11:45:03 -07:00
dgolitsyn 752151f6cb Add CachingCostBalancerStrategy (#4731)
* Add CachingCostBalancerStrategy; Rename ServerView.ServerCallback to ServerRemovedCallback

* Fix benchmark units

* Style, forbidden-api, review, bug fixes

* Add docs

* Address comments
2017-09-08 12:23:04 -05:00
Gian Merlino 33c0928bed Collapse worker select strategies, change default, add strong affinity. (#4534)
* Collapse worker select strategies, change default, add strong affinity.

- Change default worker select strategy to equalDistribution. It is
  more generally useful than fillCapacity.
- Collapse the *WithAffinity strategies into the regular ones. The
  *WithAffinity strategies are retained for backwards compatibility.
- Change WorkerSelectStrategy to return nullable instead of Optional.
- Fix a couple of errors in the docs.

* Fix test.

* Review adjustments.

* Remove unused imports.

* Switch to DateTimes.nowUtc.

* Simplify code.

* Fix tests (worker assignment started off on a different foot)
2017-09-04 14:40:55 -07:00
Himanshu 06ac6678e6 DruidLeaderSelector interface for leader election and Curator based impl. (#4699)
* DruidLeaderSelector interface for leader election and Curator based impl. DruidCoordinator/TaskMaster are updated to use the new interface.

* add fake DruidNode binding in integration-tests module

* add docs on DruidLeaderSelector interface

* remove start/stop and keep register/unregister Listener in DruidLeaderSelector interface

* updated comments on DruidLeaderSelector

* cache the listener executor in CuratorDruidLeaderSelector

* use same latch owner name that was used before

* remove stuff related to druid.zk.paths.indexer.leaderLatchPath config

* randomize the delay when giving up leadership and restarting leader latch
2017-09-01 09:49:04 -07:00
Gian Merlino 9078925cab Docs for finalizingFieldAccess post-aggregator. (#4737) 2017-08-31 11:45:49 -07:00
Bartosz Ługowski 8dddccc687 Graphite emitter - add plaintext protocol (#4265)
* Graphite emitter - add plaintext protocol. Configurable option of replacing slash to dot in metric name.

* Graphite emitter - fix misspelling in docs.

* Graphite emitter - extend docs.

* Graphite emitter - fix code style.
2017-08-29 06:23:06 -07:00
Gian Merlino daf3c5f927 Add "round" option to cardinality and hyperUnique aggregators. (#4720)
* Add "round" option to cardinality and hyperUnique aggregators.

Also turn it on by default in SQL, to make math on distinct counts
work more as expected.

* Fix some compile errors.

* Fix test.

* Formatting.
2017-08-28 14:52:11 -07:00
Gian Merlino 9fbfc1be32 Add @ExtensionPoint and @PublicApi annotations. (#4433)
* Add @ExtensionPoint and @PublicApi annotations.

* Clean up wording.

* Remove unused import.

* Remove unused imports.

* Only types can be extension points.

* Adjust annotations some more.

* Remove unused import.

* Make ServletFilterHolder an extension point.

* Add a couple extension points, and update docs.
2017-08-28 14:50:58 -07:00
zhangxinyu1 b04261e7a2 In indexing service flow chart, it should be middlemanager who writes task status to zookeeper (#4654) 2017-08-27 10:17:15 -07:00