Table of Contents
This document describes how the proposed new hapi-empi feature will work.
Design Principles
Below are some simplifying principles hapi-empi enforces to reduce complexity and ensure data integrity.
-
When empi is enabled on a hapi-fhir server, any Person resource in the repository that has the "hapi-empi" tag is now effectively read-only via the FHIR endpoint. These Person resources are managed exclusively by hapi-empi. Users can only directly change them via hapi-empi REST operations. In most cases, users indirectly change them by creating and updating Patient and Practitioner ("Pat/Prac") resources. For the rest of this document, assume "Person" refers to a "hapi-empi" tagged Person resource.
-
Every ("Pat/Prac") resource in the system is MATCH-linked to a Person resource unless that Patient or Practitioner has the "no-empi" tag or it has POSSIBLE_MATCH links pending review.
-
Every Pat/Prac in the system has a MATCH link to at most one Person resource.
-
The hapi-empi rules define a single identifier system that holds the external enterprise id ("EID"). If a Pat/Prac has an external EID, then the Person it links to always has the same EID. If a patient has no EID when it arrives, the person created from this patient is given an internal EID.
-
A Person can have both an internal EID(auto-created by HAPI), and an external EID (provided by some external system).
-
Two different Person resources cannot have the same EID.
-
Pat/Prac resources are only ever compared to Person resources via this EID. For all other matches, Patient resources are only ever compared to Patient resources and Practitioner resources are only ever compared to Practitioner resources.
Links
-
hapi-empi manages empi-link records ("links") that link a Pat/Prac resource to a Person resource. When these are changed by matching rules, the links are marked as AUTO. When these links are changed manually, they are marked as MANUAL.
-
Once a link has been manually assigned as NO_MATCH or MATCHED, the system will not change it.
-
When a new Pat/Prac resource is created/updated then it is compared to all other Pat/Prac resources in the repository. The outcome of each of these comparisons is either NO_MATCH, POSSIBLE_MATCH or MATCHED.
-
Whenever a MATCHED link is established between a Pat/Prac resource and a Person resource, that Pat/Prac is always added to that Person resource links. All MATCHED links have corresponding Person resource links and all Person resource links have corresponding MATCHED empi-link records. You can think of the fields of the empi-link records as extra meta-data associated with each Person.link.target.
Possible rule match outcomes:
When a new Pat/Prac resource is compared with all other resources of that type in the repository, there are four possible cases:
-
CASE 1: No MATCHED and no POSSIBLE_MATCHED outcomes -> a new Person resource is created and linked to that Pat/Prac. All fields are copied from the Pat/Prac to the Person. If the incoming resource has an EID, it is copied to the Person. Otherwise a new UUID is created and used as the internal EID.
-
CASE 2: All of the MATCHED Pat/Prac resources are already linked to the same Person -> a new Link is created between the new Pat/Prac and that Person and is set to MATCHED.
-
CASE 3: The MATCHED Pat/Prac resources link to more than one Person -> Mark all links as POSSIBLE_MATCHED. All other Person resources are marked as POSSIBLE_DUPLICATE of this first Person. These duplicates are manually reviewed later and either merged or marked as NO_MATCH and the system will no longer consider them as a POSSIBLE_DUPLICATE going forward. POSSIBLE_DUPLICATE is the only link type that can have a Person as both the source and target of the link.
-
CASE 4: Only POSSIBLE_MATCH outcomes -> In this case, empi-link records are created with POSSIBLE_MATCH outcome and await manual assignment to either NO_MATCH or MATCHED. Person resources are not changed.
Rules
hapi-empi rules are managed via a single json document. This document contains a version. empi-links derived from these rules are marked with this version. The following configuration is stored in the rules:
- resourceSearchParams: These define fields which must have at least one exact match before two resources are considered for matching. This is like a list of "pre-searches" that find potential candidates for matches, to avoid the expensive operation of running a match score calculation on all resources in the system. E.g. you may only wish to consider matching two Patients if they either share at least one identifier in common or have the same birthday.
[ {
"resourceType" : "Patient",
"searchParam" : "birthdate"
}, {
"resourceType" : "Patient",
"searchParam" : "identifier"
} ]
- filterSearchParams When searching for match candidates, only resources that match this filter are considered. E.g. you may wish to only search for Patients for which active=true.
[ {
"resourceType" : "Patient",
"searchParam" : "active",
"fixedValue" : "true"
} ]
- matchFields Once the match candidates have been found, they are then each assigned a match vector that marks which fields match. The match vector is determined by a list of matchFields. Each matchField defines a name, distance metric, a success threshold, a resource type, and resource path to check. For example:
{
"name" : "given-name",
"resourceType" : "Patient",
"resourcePath" : "name.given",
"metric" : "COSINE",
"matchThreshold" : 0.8
}
Note that in all the above json, valid options for resourceType
are Patient
, Practitioner
, and All
. Use All
if the criteria is identical across both resource types, and you would like to apply the pre-search to both practitioners and patients equally.
- weightMap A map which converts combinations of successful matchFields into an EMPI Match Result score for overall matching of a given pair of resources.
"weightMap" : {
"given-name" : "POSSIBLE_MATCH",
"given-name,last-name" : "MATCH"
}
- eidSystem: The external EID system that the HAPI-EMPI system should expect to see on incoming Pat/Prac resources. Must be a valid URI.