8555 Commits

Author SHA1 Message Date
scrawfor
bf2a31a5bc Add new 'true' filter which always returns true. (#5711)
* Add new 'true' filter which always returns true.

* Add support for bitmap index.

* Adds documentation.

* Removes No-op Filter
2018-06-28 11:52:45 -07:00
zhangxinyu
d857345b7d add method getRequiredColumns for DimFilter (#5872)
* add method getRequiredColumns for DimFilter

* deal with the NullPointerException when DimFilter is null
2018-06-27 15:45:46 -07:00
Surekha
0f429298cf Fix Kafka Indexing task pause forever if no events in taskDuration (#5656) (#5899)
* Fix Kafka Indexing task pause forever (#5656)

* Fix Nullpointer Exception in overlord if taskGroups does not contain the groupId
* If the endOffset is same as startOffset, still let the task resume instead of returning
   endOffsets early which causes the tasks to pause forever and ultimately fail on timeout

* Address PR comment

*Remove the null check and do not return null from generateSequenceName
2018-06-25 19:29:36 -07:00
陈春斌
7649742943 Use ReentrantReadWriteLock in DimensionDictionary (#5883) 2018-06-25 12:35:26 -07:00
Gian Merlino
a28314349c
Fix spelling of "propagate" in various places. (#5896)
One of these is a configuration parameter (introduced in #5429),
but it's never been in a release, so I think it's ok to rename it.
2018-06-25 09:18:08 -07:00
George Paraskevas
4b111929ec Fix typo lage->large , improve warning message (#5890) 2018-06-22 17:33:02 -07:00
Jihoon Son
8c5ded0fad
Splitting KafkaIndexTask for better code maintenance (#5854)
* Refactoring KafkaIndexTask for better code maintenance

* fix bug

* fix test

* add annotation

* fix checkstyle

* remove SetEndOffsetsResult
2018-06-22 13:00:03 -07:00
Clint Wylie
1a7adabf57 Coordinator segment balancer max load queue fix (#5888)
* Coordinator segment balancer will now respect "maxSegmentsInNodeLoadingQueue" config

* allow moves from full load queues

* better variable names
2018-06-20 23:04:41 -07:00
Niketh Sabbineni
0982472c90 Use historical node instead of realtime for querying (#4764)
* Use historical node instead of realtime for querying

* Incorporated code review comments

* Incorporate code review comments

* Remove artifact comment

* Consider non-historical nodes as realtime
2018-06-20 22:53:56 -07:00
Jonathan Wei
0eae89170e
Make DruidPlanner constructor public again (#5891) 2018-06-20 11:10:50 -07:00
Surekha
8619adb5b9 Improve task retrieval APIs on Overlord (#5801)
* Add the new tasks api in overlordResource

It takes 4 optional query params
* state(pending/running/waiting/compelte)
* dataSource
* interval (applies to completed tasks)
* maxCompletedTasks (applies to completed tasks)

If all params are null, the api returns all the tasks

* Add the state to each task returned by tasks endpoint

* divide active tasks into waiting, pending or running
* Add more unit tests

* Add UNKNOWN state to TaskState

* Fix the authorization calls

* WIP: PR comments

Added new class to capture task info for caching
Other refactoring

* Refactoring : move TaskStatus class to druid-api

so it can be accessed within server
And other related classes like TaskState and TaskStatusPlus are in api

* Remove unused class and apis accessing it

* Add a separate cache for recently completed tasks

This is to mainly capture the task type from payload

* Ignore a test

* Add a RuntimeTaskState to encompass all states a task can be in

* Revert "Add a RuntimeTaskState to encompass all states a task can be in"

This reverts commit 2a527a0731a064dc0f15cf2ba3dfc5f639c6e182.

* Fix wrong api call

* Fix and unignore tests

* Remove waiting,pending state from TaskState

* Add RunnerTaskState

* Missed the annotation runnerStatusCode

* Fix the creationTime

* Fix the createdTime and queueInsertionTime for running/active tasks
* Clean up tests

* Add javadocs

* Potentially fix the teamcity build

* Address PR comments

*Get rid of TaskInfoBuilder
*Make TaskInfoMapper static nested class
*Other changes

* fix import in MaterializedViewSupervisor after merge

* Address PR comments on

* Replace global cache with local map
* combine multiple queries into one
* Removed unused code

* Fix unit tests

Fix a bug in securedTaskStatusPlus

* Remove getRecentlyFinishedTaskStatuses method

Change TaskInfoMapper signature to add generic type

* Address PR comments

* Passed datasource as argument to be used in sql query
* Other minor fixes

* Address PR comments

*Some minor changes, rename method, spacing changes

* Add early auth check if datasource is not null

* Fix test case

* Add max limit to getRecentlyFinishedTaskInfo in HeapMemoryTaskStorage
* Add TaskLocation to Anytask object

* Address PR comments

* Fix a bug in test case causing ClassCastException
2018-06-19 11:34:59 -07:00
Gian Merlino
6d0dd2fd0f CalciteQueryTest: Add more subquery tests. (#5880)
None of them actually work right now, but this is useful to help document,
via tests, what works and what doesn't.
2018-06-18 11:54:29 -07:00
Charles Allen
8dc4aca25f
Add cgroup memory monitor (#5866)
* Add cgroup memory monitor

* Port of https://github.com/metamx/java-util/pull/67

* Fix copyright

* Don't use `String.format`
2018-06-18 10:03:44 -07:00
varaga
b4b1b2a020 Provisioning support for ZooKeeper Authorization (#5701)
Review comments implemented
2018-06-15 14:02:01 -07:00
Dylan Wylie
8c6651022d Update jsonpath dependency (#5794)
* Update JSONPath Library

Re: #5792

- Add a unit test containing a JSONPath conditional
- Update the JSONPath library and no longer exclude the json-smart dependency.
- I believe the original reason for excluding this has been fixed: https://github.com/json-path/JsonPath/pull/315

* Add test

* Fix test
2018-06-15 13:50:48 -07:00
Dylan Wylie
1f700bb880 Suppress JsonPath exceptions in AvroFlattener (#5793)
Re: #5791

- Make the AvroFlattenerMake consistent with the JSONFlattenerMaker
2018-06-14 17:38:15 -07:00
Jonathan Wei
dc67b77ec2 Immediately send 401 on basic HTTP authentication failure (#5856)
* Immediately send 401 on basic HTTP authentication failure

* Add unit tests
2018-06-14 10:23:10 -07:00
Joseph Glanville
1032387d78 Snappy decompression support (#5864)
Support decompression of files using Google's Snappy algorithm.

This only supports files compressed with the Snappy framing format
described here: https://github.com/google/snappy/blob/master/framing_format.txt
2018-06-14 10:55:42 +01:00
Gian Merlino
e0eb7048f6 Remove evil.zip test file. (#5879)
Removes an evil.zip file added by #5850, since it's not necessary.
The tests in that patch create their own evil files.
2018-06-13 16:02:18 -07:00
Nishant Bangarwa
1c031784cb Align long Aggregator implementation with Double and Float (#5861)
Add LongMin/Max aggregator combiners
Extract common code from LongSum/Min/MaxAggregatorFactories in
SimpleLongAggregatorFactory
2018-06-14 01:56:41 +04:00
Jonathan Wei
24efbb054c
Fix inefficient available segment cache population in SQLMetadataSegmentManager (#5878) 2018-06-12 18:53:30 -07:00
Jonathan Wei
bc9da54e12
Fix Zip Slip vulnerability (#5850)
* Fix evil zip exploit

* PR comment, checkstyle

* PR comments

* Add link to vulnerability report

* Fix test
2018-06-12 10:03:08 -07:00
Jihoon Son
2feec44a55 Fix mismatch in revoked task locks between memory and metastore after sync from storage (#5858)
* Fix mismatched revoked task locks after sync from storage

* fix build

* fix log

* fix lock release
2018-06-12 10:25:34 -04:00
Gian Merlino
0ae4aba4e2 HdfsDataSegmentPusher: Close tmpIndexFile before copying it. (#5873)
It seems that copy-before-close works OK on HDFS, but it doesn't work
on all filesystems. In particular, we observed this not working properly
with Google Cloud Storage. And anyway, it's better hygiene to close files
before attempting to copy them somewhere else.
2018-06-12 08:58:48 +01:00
Jihoon Son
fe4d678aac Support projection after sorting in SQL (#5788)
* Add sort project

* add more test

* address comments
2018-06-11 11:33:47 -04:00
zhangxinyu
e43e5ebbcd Materialized view implementation (#5556)
* implement materialized view

* modify code according to jihoonson's comments

* modify code according to jihoonson's comments - 2

* add documentation about materialized view

* use new HadoopTuningConfig in pr 5583

* add minDataLag and fix optimizer bug

* correct value of DEFAULT_MIN_DATA_LAG_MS

* modify code according to jihoonson's comments - 3

* use the boolean expression instead of if-else
2018-06-09 12:24:54 -07:00
awelsh93
6f0aedd6ab Fix defaultQueryTimeout (#5807)
* Fix defaultQueryTimeout

- set default timeout in query context before query fail time is evaluated

Remove unused import

* Address failing checks

* Addressing code review comments

* Removed line that was no longer used
2018-06-08 15:34:10 -07:00
Caroline1000
96feb479cd add order change needed for KIS in 0.12.0 (#5760) 2018-06-08 15:25:26 -07:00
Fokko Driesprong
b7e8812d18 Bump Apache Parquet to 1.10.0 (#5776)
* Bump Apache Parquet to 1.8.3

* Bump Apache Parquet to 1.10.0
2018-06-08 15:23:35 -07:00
Hongze Zhang
cfa94b747b Update to jetty 9.4; Enable request decompression (#5624)
* Update to jetty 9.4; Enable request decompression; Add http compression config options

* Fix BadMessageException from jetty server at HttpGenerator.generateHeaders(...)
2018-06-08 14:53:08 -07:00
Charles Allen
0fd42af8d6 Make the google storage extension friendlier to 429 and 5XX responses (#5750)
* Make the google extension friendlier to 429 responses

* Lots of trouble for a little space

* Add in better tests and fix formatting

* Add 500 errors as well as some basic unit tests

* Add IOException to test

* Add some more stuff to killer test

* Change error code in puller test

* fix tests and make errors more generic handling
2018-06-07 13:19:35 -07:00
awelsh93
adbe22c05b Security - add anonymous authenticator (#5842)
* Anonymous authenticator that authenticates all requests and then directs them to an authorizer.

* Adding documentation

* Removed some fields from class AnonymousAuthenticator

* Updating docs
2018-06-07 10:17:54 -07:00
Gian Merlino
3af95913a9 Lazy-ify IncrementalIndex filtering too. (#5852)
* Lazy-ify IncrementalIndex filtering too.

Follow-up to #5403, which only lazy-ified cursor-based filtering
on QueryableIndex.

* Fix logic error.
2018-06-06 18:03:34 -07:00
Siddharth Subramanian
37409dc2f4 Fix minor documentation error (#5851)
Adding a required `,` in the sample JSON
2018-06-06 12:51:56 -07:00
Ryan Plessner
ee45ee6915 Fix docs to reflect the correct default max total row count for the IndexTuningConfig (#5845) 2018-06-05 13:15:12 -07:00
awelsh93
1a4707f09c Remove extra slash in endpoint (#5822) 2018-06-05 13:11:26 -07:00
Jonathan Wei
684b5d18c1
Moving averages for ingestion row stats (#5748)
* Moving averages for ingestion row stats

* PR comments

* Make RowIngestionMeters extensible

* test and checkstyle fixes

* More PR comments

* Fix metrics

* Add some comments

* PR comments

* Comments
2018-06-05 09:08:57 -07:00
Gian Merlino
78fd27cdb2
Lazy-ify ValueMatcher BitSet optimization for string dimensions. (#5403)
* Lazy-ify ValueMatcher BitSet optimization for string dimensions.

The idea is that if the prior evaluated filters are decently selective,
such that they mean we won't see all possible values of the later
filters, then the eager version of the optimization is too wasteful.

This involves checking an extra bitset, but the overhead is small even
if the lazy-ification is useless.

* Remove import.

* Minor transformation
2018-06-05 09:06:51 -07:00
Alexander Saydakov
d1cdcd4895 Datasketches doc correction (#5816)
* func was renamed to operation during code review

* added missing descriptions, some cleanup
2018-06-05 17:52:37 +05:30
Clint Wylie
2b45a6a42d Fix topN lexicographic sort (#5815)
* fixes #5814
changes:
* pass `StorageAdapter` to topn algorithms to get things like if column is 'sorted' or if query interval is smaller than segment granularity, instead of using `io.druid.segment.Capabilities`
* remove `io.druid.segment.Capabilities` since it had one purpose, supplying `dimensionValuesSorted` which is now provided directly by `StorageAdapter`.
* added test for topn optimization path checking

* add Capabilities back since StorageAdapter is marked PublicApi

* oops

* add javadoc, fix build i think

* correctly revert api changes

* fix intellij fail

* fix typo :(
2018-05-31 09:53:29 -07:00
Atul Mohan
50ad7a45ff Fix authentication doc (#5813) 2018-05-30 11:10:48 -07:00
Jihoon Son
67ff7dacbd Support server-side encryption for s3 (#5740)
* Support server-side encryption for s3

* fix teamcity

* typo

* address comments

* Refactoring configuration injection

* fix doc

* fix doc
2018-05-28 20:22:08 -07:00
Joseph Glanville
5cbfb95e1f docs: Document inputFormat on Hadoop InputSpecs (#5784) 2018-05-24 21:44:37 -07:00
Jonathan Wei
8799d46fe9 Fix NPE in PrefetchableTextFilesFirehoseFactory (#5802) 2018-05-24 21:44:03 -07:00
Gian Merlino
bc0ff251a3 Docs: Clarify the meaning of maxSplitSize. (#5803) 2018-05-24 21:43:39 -07:00
Michael Schnupp
33b4eb624d fix freeSpacePercent in segmentCache.locations (#5765)
* fix freeSpacePercent in segmentCache.locations

* the check should probably test the other way around
* documentation should put the option in the right place
* examples have a superfluous backslash

* add test to verify correct behavior

* switch to Path and test with jimfs

Path allows to use different filesystems.
Jimfs provides an actual (in memory) filesystem.
This also allows more complex test scenarios.

The behavior should be unchanged by this commit.

* Revert "switch to Path and test with jimfs"

This reverts commit 8b9a418d65a42a3adb87756967e780442484a9d9.
2018-05-24 11:15:30 +09:00
Gian Merlino
29af9f452a Fix for when Hadoop dataSource inputSpec is specified multiple times. (#5790)
This feature was introduced in #5717 but it didn't work in production
because this magical rewriter code wasn't also modified. Now, it is.
2018-05-23 03:16:55 +05:30
Atul Mohan
1b9611a60e Local indexing from RDBMS (#5441)
* Local indexing from RDBMS

*  Fix content

* Remove pom changes

* Remove extraneous space

* Add tests and update documentation

* Fix comments

* Fix docs

*  Fix build related issue

*  Handle invalid strings

* Make target database independent of metadata storage

* Add firehose connector

* Fix accessibility

* Add docs

* Remove unused def

* Remove lazy instantiation of jsoniterator

* Move unused changes

* Move unused changes

* Fix build

* Make Sqlfirehose method private
2018-05-22 12:33:01 +09:00
Dylan Wylie
c537ea56f6 Validate dataschema datasource (#5785)
* Validate dataschema has a datasource

* Fix tests

* Use Guava Strings.isNullOrEmpty

* Inverse nullempty check, whoops
2018-05-18 16:29:06 -07:00
Gian Merlino
f2cc6ce4d5
VersionedIntervalTimeline: Optimize construction with heavily populated holders. (#5777)
* VersionedIntervalTimeline: Optimize construction with heavily populated holders.

Each time a segment is "add"ed to a timeline, "isComplete" is called on the holder
that it is added to. "isComplete" is an O(segments per chunk) operation, meaning
that adding N segments to a chunk is an O(N^2) operation. This blows up badly if
we have thousands of segments per chunk.

The patch defers the "isComplete" check until after all segments have been
inserted.

* Fix imports.
2018-05-16 09:16:59 -07:00