Smile CDR v2024.05.PRE
On this page:

22.2.1MDM Rule Definition
Experimental

 

In the MDM User Interface, you need to provide the MDM Rule Definition Script, a JSON document describing exactly how and why two FHIR Resources should be linked together.

Here are the minimal fields that must be included in this JSON document:

{
  "version": "1",
  "mdmTypes": [],
  "candidateSearchParams": [],
  "candidateFilterSearchParams": [],
  "matchFields": [],
  "matchResultMap": {}
}

The HAPI FHIR MDM Rules section introduces them. Below, you will find more detailed information about each field.

MDM processing is divided into two main phases: the first phase is about finding candidate FHIR Resources for the second phase, which consist of matching more precisely the newly created or updated input FHIR resource with the candidate FHIR Resources found in the first phase to finally create links between them.

So the first phase uses candidateSearchParams and candidateFilterSearchParams on the FHIR Resource types listed in mdmTypes to find the candidates, the second phase uses matchFields and matchResultMap to create the MDM links.

Also, the optional eidSystems field can be used to change how the MDM module creates links and Golden Record resources. See Using Enterprise Identifiers (EIDs) in MDM Rule Definition section for more details.

22.2.2Finding Candidates

 
Field Name Brief Description Notes and Comments
mdmTypes List the different FHIR Resource types that will be analyzed by MDM module. Any FHIR Resource having an active identifier SearchParameter can be configured in MDM.
  • Most common FHIR Resources used in MDM:
    • Patient, Practitioner, Organization, Location, Person
  • Others:
    • Account, CareTeam, RelatedPerson, Group, Device, InsurancePlan, etc.

Even if multiple types are listed, the MDM module will only link resources of the same type together. For example, if mdmTypes lists Patient and Practitioner, no Patient will be linked to Practitioner, and vice versa.

candidateSearchParams List the SearchParameter used that must have at least one exact match before two resources are considered for matching. It is used to search candidate FHIR Resources based on the listed SearchParameter and the field values of the newly created or updated input FHIR resource. The candidates found by these searches will be used to find matches with the input FHIR resource more precisely later in the second phase.

One search will be done in the database for each entry in the candidateSearchParams array, all executed in parallel.

The FHIR Resources must already be indexed with each SearchParameter used. Custom SearchParameter can be created/indexed beforehand and used. Obviously, the performance of the first phase is totally dependent on the speed of the searches done with the SearchParameter.

It is important to note that if too many candidates are found by the search queries, it will slow down the matching process a lot. For that reason, any field that divides the data into numerous small groups could be considered.
Example: most identifiers, phone number, date of birth.

But fields that divide the data into few large groups wouldn't make sense to be used performance wise.
Example: gender, given name, some identifiers that group many resources, province or state.

Also, data associated with locations, or data that change regularly in one resource might prevent finding appropriate candidate resources that could have been used during the matching phase.
Example: street address.

candidateFilterSearchParams Used to add filters to be applied on the candidateSearchParams searches, to further minimize the number of candidates to analyze. The fields used in these filters must already be indexed with a SearchParameter. Custom SearchParameter can also be created/indexed beforehand and used for it.

An optional qualifier can also be used with the SearchParameter, with either ABOVE, BELOW, NOT, IN, NOT_IN, TEXT or OF_TYPE value.

Example of filtering: select resources having active status only, family name not equals to a particular test value.

Typically, it is used to exclude resources from groups of data selected in candidateSearchParams.

Example: select resources excluding identifiers of a particular system or value.

Other examples:

[ {
    "resourceType": "Patient",
    "searchParam": "family",
    "qualifier": "NOT",
    "fixedValue": "TestFamilyName"
}, {
    "resourceType": "Patient",
    "searchParam": "active",
    "fixedValue": "true"
}, {
    "resourceType": "Patient",
    "searchParam": "language",
    "qualifier": "NOT",
    "fixedValue": "fr-FR"
} ]
eidSystems Optional field used to specify which identifiers can be expected and used as unique identifier on incoming resources. Using this field affects the way MDM module processes the incoming resources. During the first phase, it will first try to find Golden Record resources having the specified EID before finding candidates. See 'Using Enterprise Identifiers (EIDs) in MDM Rule Definition' section for more details.

Usually, it is possible to test the searches that will be done during the first phase to find the candidate resources, to assess their performance.

For example, using this sample MDM rule definition (excluding matchFields and matchResultMap for simplicity):

{
  "version": "v2022-10-01",
  "mdmTypes": [ "Organization" ],
  "candidateSearchParams": [
      {
          "resourceType": "*",
          "searchParams": [
              "identifier"
          ]
      },
      {
          "resourceType": "Organization",
          "searchParams": [
              "name"
          ]
      }
  ],
  "candidateFilterSearchParams": [
      {
          "resourceType": "Organization",
          "searchParam": "active",
          "fixedValue": "true"
      },
      {
          "resourceType": "Organization",
          "searchParam": "type",
          "qualifier": "NOT",
          "fixedValue": "other"
      }
  ]
}

Here, the JSON document specifies that two searches should be done to find candidates, one with identifier SearchParameter and one with name SearchParameter. Also, the results of both searches should be filtered to include active resources only and exclude all resources having type code value of 'other'.

When adding this new Organization resource:

{
  "resourceType": "Organization",
  "identifier": [
      {
          "system": "http://mysite.com/fhir/system/our-internal-organization-id",
          "value": "MyOrg-123"
      }
  ],
  "name": "MyOrganization",
  "active": true,
  "type": [
      {
          "coding": [
              {
                  "system": "http://terminology.hl7.org/CodeSystem/organization-type",
                  "code": "edu",
                  "display": "Educational Institute"
              }
          ]
      }
  ]
}

These equivalent searches will be run in parallel to find the candidate resources: http://localhost:8000/Organization?identifier=http://mysite.com/fhir/system/our-internal-organization-id|MyOrg-123&active=true&type:not=other
http://localhost:8000/Organization?name=MyOrganization&active=true&type:not=other

All Organization resources found by these queries will be kept for the second phase, and be matched more precisely with the new 'MyOrg-123' Organization resource. If no Organization resource is returned by the search queries, then the second phase is skipped, and no new MDM link is created (beside the one to a new Golden Record).

The searches are pretty specific and fast, and should not return a lot of Organization resources as candidates, which is better performance wise.

Ideally, these searches should be tested beforehand to make sure they run quickly and don't return too many resources.

22.2.3Matching and Creating Links

 
Field Name Brief Description Notes and Comments
version Identify the current version of your rule definition JSON document. Mandatory field, can be any non-empty string of 16 characters maximum. Useful for debugging purposes, newly created MDM links will be associated with this version.

It is highly recommended to change this version when you change your rule definition JSON document, as it could permit to identify unwanted MDM links and understand why they were created more easily.

matchFields Used to specify exactly how to compare one of more fields of the incoming resource to the candidates found. A lot of different comparison algorithms are provided in the HAPI-FHIR documentation.

Each entry can use either a resourcePath or a more custom fhirPath that will be used to retrieve the resource field values used for the comparisons.

An entry from matchFields array will be used only if its name appears in at least one of matchResultMap keys. Unused entries can be safely removed from the array.

The name field should not contain any comma character because of the way they are used in matchResultMap.

matchResultMap This map lists the specific ways for two resources to match together and be linked. The key of the map entries lists the matchFields entries by name to be used for comparisons. Each matchFields name in the key is separated by comma character: matchFieldNameA,matchFieldNameB,matchFieldNameC for example. Each matchFields in the key will be evaluated against the incoming resource. The value of the map entries is the resulting link type: MATCH or POSSIBLE_MATCH.

It is not necessary to include an entry in the map for NO_MATCH result, as no link is created when two resources don't match by default. Links are created for MATCH and POSSIBLE_MATCH results only.

Only one link is kept between two resources. If multiple entries in the map are found to be true between two resources, then only one link type result is kept so that MATCH results always take precedence over POSSIBLE_MATCH results.

Performance wise, to make it faster you should try to minimize the number of entries in the map and also try to minimize the overlapping comparisons.

Example of unnecessary overlapping comparisons:
[
    "matchFieldA,matchFieldB" : "MATCH",
    "matchFieldA,matchFieldB,matchFieldC" : "MATCH"
]
Here, the second matchFieldA,matchFieldB,matchFieldC entry is redundant of the first matchFieldA,matchFieldB entry and would not make the process creates any additional link.
It is because the first matchFieldA,matchFieldB entry would make the MDM module creates a MATCH link no matter if matchFieldC comparison is true or not.
The second matchFieldA,matchFieldB,matchFieldC entry should be removed as it would only make the matching process slower for no additional result.

Another example of unnecessary overlapping comparisons:

[
    "matchFieldA" : "MATCH",
    "matchFieldA,matchFieldB,matchFieldC" : "MATCH"
]
Again, the second entry is not required and should be removed, as the first entry would always create MATCH links no matter if matchFieldB or matchFieldC comparisons are true or not.

An useful way of using overlapping comparisons:

[
    "matchFieldA" : "POSSIBLE_MATCH",
    "matchFieldA,matchFieldB,matchFieldC" : "MATCH"
]
This is a valid example of overlapping comparisons. The first entry would create POSSIBLE_MATCH links when matchFieldA comparison is true, however if matchFieldB and matchFieldC comparisons are also found to be true then it would create a MATCH link instead of a POSSIBLE_MATCH link, as MATCH result takes precedence.

Finally, the order of match fields in the map key doesn't matter as all comparisons are done nevertheless:

[
    "matchFieldA,matchFieldB,matchFieldC" : "MATCH",
    "matchFieldC,matchFieldA,matchFieldB" : "MATCH"
]
Both entries would produce exactly the same links, so only one entry should be kept.
eidSystems Optional field used to specify which identifiers can be expected and used as unique identifier on incoming resources. Using this field affects the way MDM module processes the incoming resources. During the second phase, MDM links and Golden Record resources will not be created in the same way, depending on Prevent modification of External EIDs and Prevent multiple EIDs from existing simultaneously on a target resource enabled properties in the MDM Configuration. See 'Using Enterprise Identifiers (EIDs) in MDM Rule Definition' section for more details.