hapi-fhir

Table of Contents

Goal
Proposed Design

Goal

This page attempts to document the initial design for a scalable multitenancy strategy. This strategy will have several goals:

It can be used to provide logical secure segregation of data (i.e. user performs a search for "find all patients in tenant 123 with name = 'smith'", they should receive only results that actually belong to this tenant
It can be used to create logical partitions of data for partition-based archiving. Note in this scenario, some prep work may be required to resolve resource references into the archived partition before the partition is removed.

The following tables are all related to an individual resource instance in the database:

HFJ_RESOURCE
HFJ_RES_VER
HFJ_RES_TAG
HFJ_FORCED_ID
HFJ_IDX_CMP_STRING_UNIQ
HFJ_SPIDX_COORDS
HFJ_SPIDX_DATE
HFJ_SPIDX_NUMBER
HFJ_SPIDX_QUANTITY
HFJ_SPIDX_STRING
HFJ_SPIDX_TOKEN
HFJ_SPIDX_URI
HFJ_RES_LINK

Proposed Design

Each of these tables would add a new Integer discriminator column called "tenant" and a new "tenant_date" column. hapi-fhir clients can populate these columns by setting resource.setUserData("TENANT", value) and resource.setUserData("TENANT_DATE", value) in a PRESTORAGE interceptor. When persisting all the records associated with that resource, hapi-fhir will set the TENANT to this value from the populated userData. If no TENANT is provided, hapi-fhir will default the value to 0.

The key feature of this design is that:

Tenant selection will be done when a resource is created. User code will have ultimate discretion about which tenent a resource belongs to, so it might be decided based on a URL prefix, a header, a hidden attribute of the logged in user, etc.
It will be possible to perform searches that are strictly restricted to one tenent, but it will also be possible to perform searches that cross tenant boundaries (e.g. the logged in user can access tenant A+B+C but not D, or the logged in user can access all tenants)
Because all resource-relevant tables will have a consistent tenant identifier via the pair of new columns, it will be possible to perform sharding and partitioning strategies at the database level using this identifier as a key