druid/docs/content/development/extensions-core/kafka-extraction-namespace.md

---
layout: doc_page
title: "Kafka Lookups"
---

<!--
  ~ Licensed to the Apache Software Foundation (ASF) under one
  ~ or more contributor license agreements.  See the NOTICE file
  ~ distributed with this work for additional information
  ~ regarding copyright ownership.  The ASF licenses this file
  ~ to you under the Apache License, Version 2.0 (the
  ~ "License"); you may not use this file except in compliance
  ~ with the License.  You may obtain a copy of the License at
  ~
  ~   http://www.apache.org/licenses/LICENSE-2.0
  ~
  ~ Unless required by applicable law or agreed to in writing,
  ~ software distributed under the License is distributed on an
  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  ~ KIND, either express or implied.  See the License for the
  ~ specific language governing permissions and limitations
  ~ under the License.
  -->

# Kafka Lookups

<div class="note caution">
Lookups are an <a href="../experimental.html">experimental</a> feature.
</div>

Make sure to [include](../../operations/including-extensions.html) `druid-lookups-cached-global` and `druid-kafka-extraction-namespace` as an extension.

If you need updates to populate as promptly as possible, it is possible to plug into a kafka topic whose key is the old value and message is the desired new value (both in UTF-8) as a LookupExtractorFactory.

```json
{
  "type":"kafka",
  "kafkaTopic":"testTopic",
  "kafkaProperties":{"zookeeper.connect":"somehost:2181/kafka"}
}
```

|Parameter|Description|Required|Default|
|---------|-----------|--------|-------|
|`kafkaTopic`|The kafka topic to read the data from|Yes||
|`kafkaProperties`|Kafka consumer properties. At least"zookeeper.connect" must be specified. Only the zookeeper connector is supported|Yes||
|`connectTimeout`|How long to wait for an initial connection|No|`0` (do not wait)|
|`isOneToOne`|The map is a one-to-one (see [Lookup DimensionSpecs](../../querying/dimensionspecs.html))|No|`false`|

The extension `kafka-extraction-namespace` enables reading from a kafka feed which has name/key pairs to allow renaming of dimension values. An example use case would be to rename an ID to a human readable format.

The consumer properties `group.id` and `auto.offset.reset` CANNOT be set in `kafkaProperties` as they are set by the extension as `UUID.randomUUID().toString()` and `smallest` respectively.

See [lookups](../../querying/lookups.html) for how to configure and use lookups.

# Limitations

Currently the Kafka lookup extractor feeds the entire kafka stream into a local cache. If you are using OnHeap caching, this can easily clobber your java heap if the kafka stream spews a lot of unique keys.
OffHeap caching should alleviate these concerns, but there is still a limit to the quantity of data that can be stored.
There is currently no eviction policy.

## Testing the Kafka rename functionality

To test this setup, you can send key/value pairs to a kafka stream via the following producer console:

```
./bin/kafka-console-producer.sh --property parse.key=true --property key.separator="->" --broker-list localhost:9092 --topic testTopic
```

Renames can then be published as `OLD_VAL->NEW_VAL` followed by newline (enter or return)
Front Matter header needs to be on the first line for md to be rendered properly by jekyll (#6733) 2018-12-13 14:47:20 -05:00			`---`
			`layout: doc_page`
			`title: "Kafka Lookups"`
			`---`

add missing license headers, in particular to MD files; clean up RAT … (#6563) * add missing license headers, in particular to MD files; clean up RAT exclusions * revert inadvertent doc changes * docs * cr changes * fix modified druid-production.svg 2018-11-13 12:38:37 -05:00			`<!--`
			`~ Licensed to the Apache Software Foundation (ASF) under one`
			`~ or more contributor license agreements. See the NOTICE file`
			`~ distributed with this work for additional information`
			`~ regarding copyright ownership. The ASF licenses this file`
			`~ to you under the Apache License, Version 2.0 (the`
			`~ "License"); you may not use this file except in compliance`
			`~ with the License. You may obtain a copy of the License at`
			`~`
			`~ http://www.apache.org/licenses/LICENSE-2.0`
			`~`
			`~ Unless required by applicable law or agreed to in writing,`
			`~ software distributed under the License is distributed on an`
			`~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY`
			`~ KIND, either express or implied. See the License for the`
			`~ specific language governing permissions and limitations`
			`~ under the License.`
			`-->`

[QTL] Move kafka-extraction-namespace to the Lookup framework. (#2800) * Move kafka-extraction-namespace to the Lookup framework. * Address comments * Fix missing kafka introspection * Fix tests to be less racy * Make testing a bit more leniant * Make tests even more forgiving * Add comments to kafka lookup cache method * Move startStopLock to just use started * Make start() and stop() idempotent * Forgot to update test after last change, test now accounts for idempotency * Add extra idempotency on stop check * Add more descriptive docs of behavior 2016-05-02 12:45:13 -04:00			`# Kafka Lookups`
refactor extensions into their own docs 2016-03-22 16:54:49 -04:00
			`<div class="note caution">`
Fix various broken links in the docs. (#2833) 2016-04-13 16:30:01 -04:00			`Lookups are an <a href="../experimental.html">experimental</a> feature.`
refactor extensions into their own docs 2016-03-22 16:54:49 -04:00			`</div>`

[QTL] Implement LookupExtractorFactory of namespaced lookup (#2926) * support LookupReferencesManager registration of namespaced lookup and eliminate static configurations for lookup from namespecd lookup extensions - druid-namespace-lookup and druid-kafka-extraction-namespace are modified - However, druid-namespace-lookup still has configuration about ON/OFF HEAP cache manager selection, which is not namespace wide configuration but node wide configuration as multiple namespace shares the same cache manager * update KafkaExtractionNamespaceTest to reflect argument signature changes * Add more synchronization functionality to NamespaceLookupExtractorFactory * Remove old way of using extraction namespaces * resolve compile error by supporting LookupIntrospectHandler * Remove kafka lookups * Remove unused stuff * Fix start and stop behavior to be consistent with new javadocs * Remove unused strings * Add timeout option * Address comments on configurations and improve docs * Add more options and update hash key and replaces * Move monitoring to the overriding classes * Add better start/stop logging * Remove old docs about namespace names * Fix bad comma * Add `@JsonIgnore` to lookup factory * Address code review comments * Remove ExtractionNamespace from module json registration * Fix problems with naming and initialization. Add tests * Optimize imports / reformat * Fix future not being properly cancelled on failed initial scheduling * Fix delete returns * Add more docs about whole introspection * Add `/version` introspection point for lookups * Add more tests and address comments * Add StaticMap extraction namespace for testing. Also add a bunch of tests * Move cache system property to `druid.lookup.namespace.cache.type` * Make VERSION lower case * Change poll period to 0ms for StaticMap * Move cache key to bytebuffer * Change hashCode and equals on static map extraction fn * Add more comments on StaticMap * Address comments * Make scheduleAndWait use a latch * Sanity renames and fix imports * Remove extra info in docs * Fix review comments * Strengthen failure on start from warn to error * Address comments * Rename namespace-lookup to lookups-cached-global * Fix injective mis-naming * Also add serde test 2016-05-24 13:56:40 -04:00			Make sure to [include](../../operations/including-extensions.html) `druid-lookups-cached-global` and `druid-kafka-extraction-namespace` as an extension.
refactor extensions into their own docs 2016-03-22 16:54:49 -04:00
[QTL] Move kafka-extraction-namespace to the Lookup framework. (#2800) * Move kafka-extraction-namespace to the Lookup framework. * Address comments * Fix missing kafka introspection * Fix tests to be less racy * Make testing a bit more leniant * Make tests even more forgiving * Add comments to kafka lookup cache method * Move startStopLock to just use started * Make start() and stop() idempotent * Forgot to update test after last change, test now accounts for idempotency * Add extra idempotency on stop check * Add more descriptive docs of behavior 2016-05-02 12:45:13 -04:00			`If you need updates to populate as promptly as possible, it is possible to plug into a kafka topic whose key is the old value and message is the desired new value (both in UTF-8) as a LookupExtractorFactory.`
refactor extensions into their own docs 2016-03-22 16:54:49 -04:00
			```json
			`{`
			`"type":"kafka",`
[QTL] Move kafka-extraction-namespace to the Lookup framework. (#2800) * Move kafka-extraction-namespace to the Lookup framework. * Address comments * Fix missing kafka introspection * Fix tests to be less racy * Make testing a bit more leniant * Make tests even more forgiving * Add comments to kafka lookup cache method * Move startStopLock to just use started * Make start() and stop() idempotent * Forgot to update test after last change, test now accounts for idempotency * Add extra idempotency on stop check * Add more descriptive docs of behavior 2016-05-02 12:45:13 -04:00			`"kafkaTopic":"testTopic",`
Silly Typo in docs 2016-05-24 16:31:58 -04:00			`"kafkaProperties":{"zookeeper.connect":"somehost:2181/kafka"}`
refactor extensions into their own docs 2016-03-22 16:54:49 -04:00			`}`
			```

			`\|Parameter\|Description\|Required\|Default\|`
			`\|---------\|-----------\|--------\|-------\|`
			\|`kafkaTopic`\|The kafka topic to read the data from\|Yes\|\|
[QTL] Move kafka-extraction-namespace to the Lookup framework. (#2800) * Move kafka-extraction-namespace to the Lookup framework. * Address comments * Fix missing kafka introspection * Fix tests to be less racy * Make testing a bit more leniant * Make tests even more forgiving * Add comments to kafka lookup cache method * Move startStopLock to just use started * Make start() and stop() idempotent * Forgot to update test after last change, test now accounts for idempotency * Add extra idempotency on stop check * Add more descriptive docs of behavior 2016-05-02 12:45:13 -04:00			\|`kafkaProperties`\|Kafka consumer properties. At least"zookeeper.connect" must be specified. Only the zookeeper connector is supported\|Yes\|\|
			\|`connectTimeout`\|How long to wait for an initial connection\|No\|`0` (do not wait)\|
Fix some broken links in druid docs (#5622) * Fix some broken links in druid docs * review comment 2018-04-11 13:27:33 -04:00			\|`isOneToOne`\|The map is a one-to-one (see [Lookup DimensionSpecs](../../querying/dimensionspecs.html))\|No\|`false`\|
refactor extensions into their own docs 2016-03-22 16:54:49 -04:00
			The extension `kafka-extraction-namespace` enables reading from a kafka feed which has name/key pairs to allow renaming of dimension values. An example use case would be to rename an ID to a human readable format.

[QTL] Move kafka-extraction-namespace to the Lookup framework. (#2800) * Move kafka-extraction-namespace to the Lookup framework. * Address comments * Fix missing kafka introspection * Fix tests to be less racy * Make testing a bit more leniant * Make tests even more forgiving * Add comments to kafka lookup cache method * Move startStopLock to just use started * Make start() and stop() idempotent * Forgot to update test after last change, test now accounts for idempotency * Add extra idempotency on stop check * Add more descriptive docs of behavior 2016-05-02 12:45:13 -04:00			The consumer properties `group.id` and `auto.offset.reset` CANNOT be set in `kafkaProperties` as they are set by the extension as `UUID.randomUUID().toString()` and `smallest` respectively.
refactor extensions into their own docs 2016-03-22 16:54:49 -04:00
[QTL] Move kafka-extraction-namespace to the Lookup framework. (#2800) * Move kafka-extraction-namespace to the Lookup framework. * Address comments * Fix missing kafka introspection * Fix tests to be less racy * Make testing a bit more leniant * Make tests even more forgiving * Add comments to kafka lookup cache method * Move startStopLock to just use started * Make start() and stop() idempotent * Forgot to update test after last change, test now accounts for idempotency * Add extra idempotency on stop check * Add more descriptive docs of behavior 2016-05-02 12:45:13 -04:00			`See [lookups](../../querying/lookups.html) for how to configure and use lookups.`
refactor extensions into their own docs 2016-03-22 16:54:49 -04:00
[QTL] Move kafka-extraction-namespace to the Lookup framework. (#2800) * Move kafka-extraction-namespace to the Lookup framework. * Address comments * Fix missing kafka introspection * Fix tests to be less racy * Make testing a bit more leniant * Make tests even more forgiving * Add comments to kafka lookup cache method * Move startStopLock to just use started * Make start() and stop() idempotent * Forgot to update test after last change, test now accounts for idempotency * Add extra idempotency on stop check * Add more descriptive docs of behavior 2016-05-02 12:45:13 -04:00			`# Limitations`
refactor extensions into their own docs 2016-03-22 16:54:49 -04:00
[QTL] Move kafka-extraction-namespace to the Lookup framework. (#2800) * Move kafka-extraction-namespace to the Lookup framework. * Address comments * Fix missing kafka introspection * Fix tests to be less racy * Make testing a bit more leniant * Make tests even more forgiving * Add comments to kafka lookup cache method * Move startStopLock to just use started * Make start() and stop() idempotent * Forgot to update test after last change, test now accounts for idempotency * Add extra idempotency on stop check * Add more descriptive docs of behavior 2016-05-02 12:45:13 -04:00			`Currently the Kafka lookup extractor feeds the entire kafka stream into a local cache. If you are using OnHeap caching, this can easily clobber your java heap if the kafka stream spews a lot of unique keys.`
			`OffHeap caching should alleviate these concerns, but there is still a limit to the quantity of data that can be stored.`
			`There is currently no eviction policy.`
refactor extensions into their own docs 2016-03-22 16:54:49 -04:00
			`## Testing the Kafka rename functionality`

			`To test this setup, you can send key/value pairs to a kafka stream via the following producer console:`

			```
			`./bin/kafka-console-producer.sh --property parse.key=true --property key.separator="->" --broker-list localhost:9092 --topic testTopic`
			```

			Renames can then be published as `OLD_VAL->NEW_VAL` followed by newline (enter or return)