* coordinator lookups mgmt improvements * revert replaces removal, deprecate it instead * convert and use older specs stored in db * more tests and updates * review comments * add behavior for 0.10.0 to 0.9.2 downgrade * incorporating more review comments * remove explicit lock and use LifecycleLock in LookupReferencesManager. use LifecycleLock in LookupCoordinatorManager as well * wip on LookupCoordinatorManager * lifecycle lock * refactor thread creation into utility method * more review comments addressed * support smooth roll back of lookup snapshots from 0.10.0 to 0.9.2 * correctly use LifecycleLock in LookupCoordinatorManager and remove synchronization from start/stop * run lookup mgmt on leader coordinator only * wip: changes to do multiple start() and stop() on LookupCoordinatorManager * lifecycleLock fix usage in LookupReferencesManagerTest * add LifecycleLock back * fix license hdr * some fixes * make LookupReferencesManager.getAllLookupsState() consistent while still being lockless * address review comments * addressing leventov's comments * address charle's comments * add IOE.java * for safety in LookupReferencesManager mainThread check for lifecycle started state on each loop in addition to interrupt * move thread creation utility method to Execs * fix names * add tests for LookupCoordinatorManager.lookupManagementLoop() * add further tests for figuring out toBeLoaded and toBeDropped on LookupCoordinatorManager * address leventov comments * remove LookupsStateWithMap and parameterize LookupsState * address review comments * address more review comments * misc fixes
12 KiB
layout |
---|
doc_page |
Lookups
Lookups are a concept in Druid where dimension values are (optionally) replaced with new values.
See dimension specs for more information. For the purpose of these documents,
a "key" refers to a dimension value to match, and a "value" refers to its replacement.
So if you wanted to rename appid-12345
to Super Mega Awesome App
then the key would be appid-12345
and the value
would be Super Mega Awesome App
.
It is worth noting that lookups support use cases where keys map to unique values (injective) such as a country code and a country name, and also supports use cases where multiple IDs map to the same value, e.g. multiple app-ids belonging to a single account manager.
Lookups do not have history. They always use the current data. This means that if the chief account manager for a particular app-id changes, and you issue a query with a lookup to store the app-id to account manager relationship, it will return the current account manager for that app-id REGARDLESS of the time range over which you query.
If you require data time range sensitive lookups, such a use case is not currently supported dynamically at query time, and such data belongs in the raw denormalized data for use in Druid.
Very small lookups (count of keys on the order of a few dozen to a few hundred) can be passed at query time as a "map" lookup as per dimension specs.
Other lookup types are available as extensions, including:
- Globally cached lookups from local files, remote URIs, or JDBC through lookups-cached-global.
- Globally cached lookups from a Kafka topic through kafka-extraction-namespace.
Dynamic Configuration
These configs are accessed using JSON through the following URI template
http://<COORDINATOR_IP>:<PORT>/druid/coordinator/v1/lookups/{tier}/{id}
All URIs below are assumed to have http://<COORDINATOR_IP>:<PORT>
prepended.
If you have NEVER configured lookups before, you MUST post an empty json object {}
to /druid/coordinator/v1/lookups
to initialize the configuration.
These endpoints will return one of the following results:
- 404 if the resource is not found
- 400 if there is a problem in the formatting of the request
- 202 if the request was accepted asynchronously (
POST
andDELETE
) - 200 if the request succeeded (
GET
only)
Configuration propagation behavior
The configuration is propagated to the query serving nodes (broker / router / peon / historical) by the coordinator. The query serving nodes have an internal API for managing lookups on the node and those are used by the coordinator. The coordinator periodically checks if any of the nodes need to load/drop lookups and updates them appropriately.
Bulk update
Lookups can be updated in bulk by posting a JSON object to /druid/coordinator/v1/lookups
. The format of the json object is as follows:
{
"<tierName>": {
"<lookupName>": {
"version": "<version>",
"lookupExtractorFactory": {
"type": "<someExtractorFactoryType>",
"<someExtractorField>": "<someExtractorValue>"
}
}
}
}
Note that "version" is an arbitrary string assigned by the user, when making updates to existing lookup then user would need to specify a lexicographically higher version.
For example, a config might look something like:
{
"__default": {
"country_code": {
"version": "v0",
"lookupExtractorFactory": {
"type": "map",
"map": {
"77483": "United States"
}
}
},
"site_id": {
"version": "v0",
"lookupExtractorFactory": {
"type": "cachedNamespace",
"extractionNamespace": {
"type": "jdbc",
"connectorConfig": {
"createTables": true,
"connectURI": "jdbc:mysql:\/\/localhost:3306\/druid",
"user": "druid",
"password": "diurd"
},
"table": "lookupTable",
"keyColumn": "country_id",
"valueColumn": "country_name",
"tsColumn": "timeColumn"
},
"firstCacheTimeout": 120000,
"injective": true
}
},
"site_id_customer1": {
"version": "v0",
"lookupExtractorFactory": {
"type": "map",
"map": {
"847632": "Internal Use Only"
}
}
},
"site_id_customer2": {
"version": "v0",
"lookupExtractorFactory": {
"type": "map",
"map": {
"AHF77": "Home"
}
}
}
},
"realtime_customer1": {
"country_code": {
"version": "v0",
"lookupExtractorFactory": {
"type": "map",
"map": {
"77483": "United States"
}
}
},
"site_id_customer1": {
"version": "v0",
"lookupExtractorFactory": {
"type": "map",
"map": {
"847632": "Internal Use Only"
}
}
}
},
"realtime_customer2": {
"country_code": {
"version": "v0",
"lookupExtractorFactory": {
"type": "map",
"map": {
"77483": "United States"
}
}
},
"site_id_customer2": {
"version": "v0",
"lookupExtractorFactory": {
"type": "map",
"map": {
"AHF77": "Home"
}
}
}
}
}
All entries in the map will UPDATE existing entries. No entries will be deleted.
Update Lookup
A POST
to a particular lookup extractor factory via /druid/coordinator/v1/lookups/{tier}/{id}
will update that specific extractor factory.
For example, a post to /druid/coordinator/v1/lookups/realtime_customer1/site_id_customer1
might contain the following:
{
"version": "v1",
"lookupExtractorFactory": {
"type": "map",
"map": {
"847632": "Internal Use Only"
}
}
}
This will replace the site_id_customer1
lookup in the realtime_customer1
with the definition above.
Get Lookup
A GET
to a particular lookup extractor factory is accomplished via /druid/coordinator/v1/lookups/{tier}/{id}
Using the prior example, a GET
to /druid/coordinator/v1/lookups/realtime_customer2/site_id_customer2
should return
{
"version": "v1",
"lookupExtractorFactory": {
"type": "map",
"map": {
"AHF77": "Home"
}
}
}
Delete Lookup
A DELETE
to /druid/coordinator/v1/lookups/{tier}/{id}
will remove that lookup from the cluster.
List tier names
A GET
to /druid/coordinator/v1/lookups
will return a list of known tier names in the dynamic configuration.
To discover a list of tiers currently active in the cluster instead of ones known in the dynamic configuration, the parameter discover=true
can be added as per /druid/coordinator/v1/lookups?discover=true
.
List lookup names
A GET
to /druid/coordinator/v1/lookups/{tier}
will return a list of known lookup names for that tier.
Internal API
The Peon, Router, Broker, and Historical nodes all have the ability to consume lookup configuration.
There is an internal API these nodes use to list/load/drop their lookups starting at /druid/listen/v1/lookups
.
These follow the same convention for return values as the cluster wide dynamic configuration. Following endpoints
can be used for debugging purposes but not otherwise.
Get Lookups
A GET
to the node at /druid/listen/v1/lookups
will return a json map of all the lookups currently active on the node.
The return value will be a json map of the lookups to their extractor factories.
{
"site_id_customer2": {
"version": "v1",
"lookupExtractorFactory": {
"type": "map",
"map": {
"AHF77": "Home"
}
}
}
}
Get Lookup
A GET
to the node at /druid/listen/v1/lookups/some_lookup_name
will return the LookupExtractorFactory for the lookup identified by some_lookup_name
.
The return value will be the json representation of the factory.
{
"version": "v1",
"lookupExtractorFactory": {
"type": "map",
"map": {
"AHF77": "Home"
}
}
}
Configuration
See the coordinator configuration guilde for coordinator configuration
To configure a Broker / Router / Historical / Peon to announce itself as part of a lookup tier, use the druid.zk.paths.lookupTier
property.
Property | Description | Default |
---|---|---|
druid.lookup.lookupTier |
The tier for lookups for this node. This is independent of other tiers. | __default |
druid.lookup.lookupTierIsDatasource |
For some things like indexing service tasks, the datasource is passed in the runtime properties of a task. This option fetches the tierName from the same value as the datasource for the task. It is suggested to only use this as peon options for the indexing service, if at all. If true, druid.lookup.lookupTier MUST NOT be specified |
"false" |
To configure the behavior of the dynamic configuration manager, use the following properties on the coordinator:
Property | Description | Default |
---|---|---|
druid.manager.lookups.hostTimeout |
Timeout (in ms) PER HOST for processing request | 2000 (2 seconds) |
druid.manager.lookups.allHostTimeout |
Timeout (in ms) to finish lookup management on all the nodes. | 900000 (15 mins) |
druid.manager.lookups.period |
How long to pause between management cycles | 120000 (2 mins) |
druid.manager.lookups.threadPoolSize |
Number of service nodes that can be managed concurrently | 10 |
Saving configuration across restarts
It is possible to save the configuration across restarts such that a node will not have to wait for coordinator action to re-populate its lookups. To do this the following property is set:
Property | Description | Default |
---|---|---|
druid.lookup.snapshotWorkingDir |
Working path used to store snapshot of current lookup configuration, leaving this property null will disable snapshot/bootstrap utility | null |
Introspect a Lookup
Lookup implementations can provide some introspection capabilities by implementing LookupIntrospectHandler
. User will send request to /druid/lookups/v1/introspect/{lookupId}
to enable introspection on a given lookup.
For instance you can list all the keys/values of a map based lookup by issuing a GET
request to /druid/lookups/v1/introspect/{lookupId}/keys"
or /druid/lookups/v1/introspect/{lookupId}/values"
Druid version 0.10.0 to 0.10.1 upgrade/downgrade
Overall druid cluster lookups configuration is persisted in metadata store and also individual lookup nodes optionally persist a snapshot of loaded lookups on disk. If upgrading from druid version 0.10.0 to 0.10.1, then migration for all persisted metadata is handled automatically. If downgrading from 0.10.1 to 0.9.0 then lookups updates done via coordinator while 0.10.1 was running, would be lost.