There is no such thing as a "Java aggregator" in Druid from a user's point of view, there are just native aggregator that happen to be implemented in Java.
- Make redirects for old links based on _redirects.json
- Replace #{DRUIDVERSION} tokens in docs with current version
- Allow origins named something other than "origin"
- Can use either s3cmd or awscli, depending on availability
* validate X-Druid-Task-Id header in request and add header to response
* modify KafkaIndexTaskClient to take a TaskLocationProvider as the TaskLocation may not remain constant
* support LookupReferencesManager registration of namespaced lookup and eliminate static configurations for lookup from namespecd lookup extensions
- druid-namespace-lookup and druid-kafka-extraction-namespace are modified
- However, druid-namespace-lookup still has configuration about ON/OFF
HEAP cache manager selection, which is not namespace wide
configuration but node wide configuration as multiple namespace shares
the same cache manager
* update KafkaExtractionNamespaceTest to reflect argument signature changes
* Add more synchronization functionality to NamespaceLookupExtractorFactory
* Remove old way of using extraction namespaces
* resolve compile error by supporting LookupIntrospectHandler
* Remove kafka lookups
* Remove unused stuff
* Fix start and stop behavior to be consistent with new javadocs
* Remove unused strings
* Add timeout option
* Address comments on configurations and improve docs
* Add more options and update hash key and replaces
* Move monitoring to the overriding classes
* Add better start/stop logging
* Remove old docs about namespace names
* Fix bad comma
* Add `@JsonIgnore` to lookup factory
* Address code review comments
* Remove ExtractionNamespace from module json registration
* Fix problems with naming and initialization. Add tests
* Optimize imports / reformat
* Fix future not being properly cancelled on failed initial scheduling
* Fix delete returns
* Add more docs about whole introspection
* Add `/version` introspection point for lookups
* Add more tests and address comments
* Add StaticMap extraction namespace for testing. Also add a bunch of tests
* Move cache system property to `druid.lookup.namespace.cache.type`
* Make VERSION lower case
* Change poll period to 0ms for StaticMap
* Move cache key to bytebuffer
* Change hashCode and equals on static map extraction fn
* Add more comments on StaticMap
* Address comments
* Make scheduleAndWait use a latch
* Sanity renames and fix imports
* Remove extra info in docs
* Fix review comments
* Strengthen failure on start from warn to error
* Address comments
* Rename namespace-lookup to lookups-cached-global
* Fix injective mis-naming
* Also add serde test
* Allow dynamically setting of shutoffTime for EventReceiverFirehose
Allow dynamically setting shutoffTime for EventReceiverFirehose
review comments and tests
* shut down exec on close
* Add lookup optimization for InDimFilter
* tests for in filter with lookup extraction fn
* refactor
* refactor2 and modified filter test
* make optimizeLookup private
* Datasource as lookup tier
* Adds an option to let indexing service tasks pull their lookup tier from the datasource they are working for.
* Fix bad docs for lookups lookupTier
* Add Datasource name holder
* Move task and datasource to be pulled from Task file
* Make LookupModule pull from bound dataSource
* Fix test
* Fix code style on imports
* Fix formatting
* Make naming better
* Address code comments about naming
* Allows RegisteredLookupExtractionFn to find its lookups lazily
* Use raw variables instead of AtomicReference
* Make sure to use volatile
* Remove extra local variable.
* Move from BAOS to ByteBuffer
* new interval based cost function
Addresses issues with balancing of segments in the existing cost function
- `gapPenalty` led to clusters of segments ~30 days apart
- `recencyPenalty` caused imbalance among recent segments
- size-based cost could be skewed by compression
New cost function is purely based on segment intervals:
- assumes each time-slice of a partition is a constant cost
- cost is additive, i.e. cost(A, B union C) = cost(A, B) + cost(A, C)
- cost decays exponentially based on distance between time-slices
* comments and formatting
* add more comments to explain the calculation
* Allow user to set cost balancer threads more than the number of cores.
Allow user to set cost balancer threads more than the number of cores.
* modify test