In the MDM User Interface, you need to provide the MDM Rule Definition Script
, a JSON document describing exactly how and why two FHIR Resources should be linked together.
Here are the minimal fields that must be included in this JSON document:
{
"version": "1",
"mdmTypes": [],
"candidateSearchParams": [],
"candidateFilterSearchParams": [],
"matchFields": [],
"matchResultMap": {}
}
The HAPI FHIR MDM Rules section introduces them. Below, you will find more detailed information about each field. Additionally, if you'd like to test out how different MDM algorithms work then see $mdm-evaluate operation.
MDM processing is divided into two main phases: the first phase is about finding candidate FHIR Resources for the second phase, which consist of matching more precisely the newly created or updated input FHIR resource with the candidate FHIR Resources found in the first phase to finally create links between them.
So the first phase uses candidateSearchParams
and candidateFilterSearchParams
on the FHIR Resource types listed in mdmTypes
to find the candidates, the second phase uses matchFields
and matchResultMap
to create the MDM links.
Also, the optional eidSystems
field can be used to change how the MDM module creates links and Golden Record resources. See Using Enterprise Identifiers (EIDs) in MDM Rule Definition section for more details.
Field Name | Brief Description | Notes and Comments |
---|---|---|
mdmTypes |
List the different FHIR Resource types that will be analyzed by MDM module. | Any FHIR Resource having an active identifier SearchParameter can be configured in MDM.
Even if multiple types are listed, the MDM module will only link resources of the same type together. For example, if |
candidateSearchParams |
List the SearchParameter used that must have at least one exact match before two resources are considered for matching. | It is used to search candidate FHIR Resources based on the listed SearchParameter and the field values of the newly created or updated input FHIR resource.
The candidates found by these searches will be used to find matches with the input FHIR resource more precisely later in the second phase.
One search will be done in the database for each entry in the The FHIR Resources must already be indexed with each SearchParameter used. Custom SearchParameter can be created/indexed beforehand and used. Obviously, the performance of the first phase is totally dependent on the speed of the searches done with the SearchParameter. It is important to note that if too many candidates are found by the search queries, it will slow down the matching process a lot.
For that reason, any field that divides the data into numerous small groups could be considered. But fields that divide the data into few large groups wouldn't make sense to be used performance wise. Also, data associated with locations, or data that change regularly in one resource might prevent finding appropriate candidate resources that could have been used during the matching phase. |
candidateFilterSearchParams |
Used to add filters to be applied on the candidateSearchParams searches, to further minimize the number of candidates to analyze. |
The fields used in these filters must already be indexed with a SearchParameter. Custom SearchParameter can also be created/indexed beforehand and used for it.
An optional qualifier can also be used with the SearchParameter, with either Example of filtering: select resources having active status only, family name not equals to a particular test value. Typically, it is used to exclude resources from groups of data selected in Example: select resources excluding identifiers of a particular system or value. Other examples: |
eidSystems |
Optional field used to specify which identifiers can be expected and used as unique identifier on incoming resources. | Using this field affects the way MDM module processes the incoming resources. During the first phase, it will first try to find Golden Record resources having the specified EID before finding candidates. See 'Using Enterprise Identifiers (EIDs) in MDM Rule Definition' section for more details. |
Usually, it is possible to test the searches that will be done during the first phase to find the candidate resources, to assess their performance.
For example, using this sample MDM rule definition (excluding matchFields and matchResultMap for simplicity):
{
"version": "v2022-10-01",
"mdmTypes": [ "Organization" ],
"candidateSearchParams": [
{
"resourceType": "*",
"searchParams": [
"identifier"
]
},
{
"resourceType": "Organization",
"searchParams": [
"name"
]
}
],
"candidateFilterSearchParams": [
{
"resourceType": "Organization",
"searchParam": "active",
"fixedValue": "true"
},
{
"resourceType": "Organization",
"searchParam": "type",
"qualifier": "NOT",
"fixedValue": "other"
}
]
}
Here, the JSON document specifies that two searches should be done to find candidates, one with identifier
SearchParameter and one with name
SearchParameter. Also, the results of both searches should be filtered to include active resources only and exclude all resources having type code value of 'other'.
When adding this new Organization resource:
{
"resourceType": "Organization",
"identifier": [
{
"system": "http://mysite.com/fhir/system/our-internal-organization-id",
"value": "MyOrg-123"
}
],
"name": "MyOrganization",
"active": true,
"type": [
{
"coding": [
{
"system": "http://terminology.hl7.org/CodeSystem/organization-type",
"code": "edu",
"display": "Educational Institute"
}
]
}
]
}
These equivalent searches will be run in parallel to find the candidate resources:
http://localhost:8000/Organization
http://localhost:8000/Organization
All Organization resources found by these queries will be kept for the second phase, and be matched more precisely with the new 'MyOrg-123' Organization resource. If no Organization resource is returned by the search queries, then the second phase is skipped, and no new MDM link is created (beside the one to a new Golden Record).
The searches are pretty specific and fast, and should not return a lot of Organization resources as candidates, which is better performance wise.
Ideally, these searches should be tested beforehand to make sure they run quickly and don't return too many resources.
Field Name | Brief Description | Notes and Comments |
---|---|---|
version |
Identify the current version of your rule definition JSON document. Mandatory field, can be any non-empty string of 16 characters maximum. | Useful for debugging purposes, newly created MDM links will be associated with this version.
It is highly recommended to change this version when you change your rule definition JSON document, as it could permit to identify unwanted MDM links and understand why they were created more easily. |
matchFields |
Used to specify exactly how to compare one of more fields of the incoming resource to the candidates found. | A lot of different comparison algorithms are provided in the HAPI-FHIR documentation.
Each entry can use either a An entry from The |
matchResultMap |
This map lists the specific ways for two resources to match together and be linked. | The key of the map entries lists the matchFields entries by name to be used for comparisons. Each matchFields name in the key is separated by comma character: matchFieldNameA,matchFieldNameB,matchFieldNameC for example. Each matchFields in the key will be evaluated against the incoming resource. The value of the map entries is the resulting link type: MATCH or POSSIBLE_MATCH .
It is not necessary to include an entry in the map for Only one link is kept between two resources. If multiple entries in the map are found to be true between two resources, then only one link type result is kept so that Performance wise, to make it faster you should try to minimize the number of entries in the map and also try to minimize the overlapping comparisons. Example of unnecessary overlapping comparisons: Here, the secondmatchFieldA,matchFieldB,matchFieldC entry is redundant of the first matchFieldA,matchFieldB entry and would not make the process creates any additional link.It is because the first matchFieldA,matchFieldB entry would make the MDM module creates a MATCH link no matter if matchFieldC comparison is true or not.The second matchFieldA,matchFieldB,matchFieldC entry should be removed as it would only make the matching process slower for no additional result.
Another example of unnecessary overlapping comparisons: Again, the second entry is not required and should be removed, as the first entry would always createMATCH links no matter if matchFieldB or matchFieldC comparisons are true or not.
An useful way of using overlapping comparisons: This is a valid example of overlapping comparisons. The first entry would createPOSSIBLE_MATCH links when matchFieldA comparison is true, however if matchFieldB and matchFieldC comparisons are also found to be true then it would create a MATCH link instead of a POSSIBLE_MATCH link, as MATCH result takes precedence.
Finally, the order of match fields in the map key doesn't matter as all comparisons are done nevertheless: Both entries would produce exactly the same links, so only one entry should be kept. |
eidSystems |
Optional field used to specify which identifiers can be expected and used as unique identifier on incoming resources. | Using this field affects the way MDM module processes the incoming resources. During the second phase, MDM links and Golden Record resources will not be created in the same way, depending on Prevent modification of External EIDs and Prevent multiple EIDs from existing simultaneously on a target resource enabled properties in the MDM Configuration. See 'Using Enterprise Identifiers (EIDs) in MDM Rule Definition' section for more details. |