6.18.1Working with Duplicates
Experimental

 

FHIR Repositories can end up with duplicate resources in them. This page details tools that Smile provides to work with duplicated data.

Cause of DuplicatesRecommended Approach
The same entities were imported from different source systems. E.g. Jane Smith's Patient record is imported from the Lab system as Patient/123 and from the Pharmacy system as Patient/456.Preserve the source data as is so that future data continues to be properly associated to the right patients. Use MDM to establish Golden Resources and MDM LINK records outside of the data and use MDM features like Observation?patient=Patient/123&_mdm=true to link the data at search time.
Data was accidentally duplicated on import. E.g. the same data was accidentally loaded twice (as POST new resources) or Conditional Create directives failed to match the intended target correctly. This kind of unintended duplication can also occur when translating HL7v2, CDA, or CSV data into FHIR resources.In the case where the data was accidentally duplicated, it may make sense to "clean up" the duplicates. See below for details on Smile CDR tools to merge such duplicates.

6.18.1.1Deduplication Operations

Smile CDR provides two operations to deduplicate data:

  • $merge which is a backport of the FHIR R5 Patient/$merge specification to FHIR R4
  • $replace-references performs only the update references part of this $merge operation.

6.18.1.1.1`$merge` Operation

See the FHIR R5 Patient/$merge specification page for a description of this operation. Note the Provenance functionality is not supported yet, but most other parts are. See the bottom of this page for details on the current roadmap for enhancing Smile CDR deduplication functionality.

6.18.1.1.1.1`$merge` Input Parameters

NameTypeDefaultNotes
source-patient-identifierIdentifier List of source patient identifiers
source-patientReference Source patient
target-patient-identifierIdentifier List of target patient identifiers
target-patientReference Target patient
result-patientPatient Optional merged patient resource
previewBooleanfalseIf true, no changes will be made and response will summarize what would happen were the merge to occur
delete-sourceBooleanfalseIf true, delete the source resource
resource-limitInteger512If the request is synchrononous and the number of resources to change exceeds this threshold, the operation will fail with 412 Precondition Failed. This parameter has no effect if the Prefer: respond-async header is set

See the FHIR R5 Patient/$merge specification for a detailed description of these input parameters.

resource-limit is a Smile CDR addition to protect users from accidentally changing too many resources at once. If resource-limit is larger than 10000, the value 10000 will be used.

If you request that the operation be performed asynchronously by providing the Prefer: respond-async HTTP header, then the resource-limit parameter is ignored.

When performed asynchronously, the operation is performed in batches of 1024 resource patches at a time, via PATCH transaction transaction Bundles.

6.18.1.1.1.2`$merge` Output Parameters

NameTypeNotes
inputParametersA copy of the input parameters used in the $merge operation
outcomeOperationOutcomeDetails about the result of the merge
resultPatientThe merged Patient resource
taskTaskIf the merge operation was performed asynchronously, this Task resource provides details about the status of the merge operation

6.18.1.1.1.3Merge Example 1: Merge Patient/2 into Patient/3 synchronously.

Input:

POST /Patient/$merge
Content-Type: application/fhir+json

{
  "resourceType": "Parameters",
  "parameter": [
    {
      "name": "source-patient",
      "valueReference": {
        "reference": "Patient/2"
      }
    },
    {
      "name": "target-patient",
      "valueReference": {
        "reference": "Patient/3"
      }
    }
  ]
}

Output:

HTTP/1.1 200 OK

{
  "resourceType": "Parameters",
  "parameter": [
    {
      "name": "input",
      "resource": {
        "resourceType": "Parameters",
        "parameter": [
          {
            "name": "source-patient",
            "valueReference": {
              "reference": "Patient/2"
            }
          },
          {
            "name": "target-patient",
            "valueReference": {
              "reference": "Patient/3"
            }
          }
        ]
      }
    },
    {
      "name": "outcome",
      "resource": {
        "resourceType": "OperationOutcome",
        "issue": [
          {
            "severity": "information",
            "details": {
              "text": "Merge operation completed successfully."
            }
          }
        ]
      }
    },
    {
      "name": "result",
      "resource": {
        "resourceType": "Patient",
        "id": "3",
        "identifier": [
          {
            "system": "SYS2A",
            "value": "VAL2A"
          },
          {
            "system": "SYS2B",
            "value": "VAL2B"
          },
          {
            "system": "SYSC",
            "value": "VALC"
          },
          {
            "use": "old",
            "system": "SYS1A",
            "value": "VAL1A"
          },
          {
            "use": "old",
            "system": "SYS1B",
            "value": "VAL1B"
          }
        ],
        "link": [
          {
            "other": {
              "reference": "Patient/2"
            },
            "type": "replaces"
          }
        ]
      }
    }
  ]
}

6.18.1.1.1.4Merge Example 2: Preview merge Patient/2 into Patient/3 synchronously.

Input:

POST /Patient/$merge
Content-Type: application/fhir+json

{
  "resourceType": "Parameters",
  "parameter": [
    {
      "name": "source-patient",
      "valueReference": {
        "reference": "Patient/2"
      }
    },
    {
      "name": "target-patient",
      "valueReference": {
        "reference": "Patient/3"
      }
    },
    {
      "name": "preview",
      "valueBoolean": "true"
    }
  ]
}

Output:

HTTP/1.1 200 OK

{
  "resourceType": "Parameters",
  "parameter": [
    {
      "name": "input",
      "resource": {
        "resourceType": "Parameters",
        ...copy of input...
      }
    },
    {
      "name": "outcome",
      "resource": {
        "resourceType": "OperationOutcome",
        "issue": [
          {
            "severity": "information",
            "details": {
              "text": "Preview only merge operation - no issues detected"
            },
            "diagnostics": "Merge would update 25 resources"
          }
        ]
      }
    },
    {
      "name": "result",
      "resource": {
        "resourceType": "Patient",
        ...merged patient...
      }
    }
  ]
}

6.18.1.1.1.5Merge Example 3: Merge Patient/2 into Patient/3 asynchronously.

Input:

POST /Patient/$merge
Content-Type: application/fhir+json
Prefer: respond-async

{
  "resourceType": "Parameters",
  "parameter": [
    {
      "name": "source-patient",
      "valueReference": {
        "reference": "Patient/2"
      }
    },
    {
      "name": "target-patient",
      "valueReference": {
        "reference": "Patient/3"
      }
    }
  ]
}

Output:

HTTP/1.1 202 Accepted

{
  "resourceType": "Parameters",
  "parameter": [
    {
      "name": "input",
      "resource": {
        "resourceType": "Parameters",
        ...copy of input...
      }
    },
    {
      "name": "task",
      "resource": {
        "resourceType": "Task",
        "id": "352",
        "identifier": [
          {
            "system": "http://hapifhir.io/batch/jobId",
            "value": "26738f4d-266c-4ef6-934f-1d13b1b474b9"
          }
        ],
        "status": "in-progress"
      }
    }
  ]
}

You can poll the status of the returned Task resource see when it completes. For more detailed status about the background job, you can view the status of the corresponding Smile CDR batch job either through the Web Admin Console, or through the Admin JSON API. The id of the Smile CDR batch job is provided as an identifer on the returned Task.

6.18.1.1.2`$replace-references` Operation

The $replace-references operation searches for all resources in the repository that have a reference to the source resource, and updates those references to point to the target resource. It is a simplified form of the $merge operation when all you want to do is update references. This operation creates a Transaction Bundle of Patch operations to update the references and returns the output of performing that transaction.

6.18.1.1.2.1`$replace-references` Input Parameters

NameTypeDefaultNotes
source-reference-idString The id of the source resource reference to be replaced
target-reference-idString The id of the target resource reference that the references will be replaced with
resource-limitInteger512If the request is synchrononous and the number of resources to change exceeds this threshold, the operation will fail with 412 Precondition Failed. This parameter has no effect if the Prefer: respond-async header is set

The resource-limit parameter is available to control how many resources can be changed by this operation. If resource-limit is larger than 10000, the value 10000 will be used.

If you request that the operation be performed asynchronously by providing the Prefer: respond-async HTTP header, then the resource-limit parameter is ignored.

When performed asynchronously, the operation is performed in batches of 1024 resource patches at a time, via PATCH transaction transaction Bundles.

6.18.1.1.2.2`$replace-references` Output Parameters

NameTypeNotes
outcomeBundleThe result of the Bundle patch transaction
taskTaskIf the operation was performed asynchronously, this Task resource provides details about the status of the operation

See the $merge operation above for details about the returned Task resource in the case when the operation is performed asynchronously.

6.18.1.1.2.3Replace References Example: Replace all references to Patient/2 with references to Patient/3 synchronously.

Input:

POST /Patient/$merge
Content-Type: application/fhir+json

{
  "resourceType": "Parameters",
  "parameter": [
    {
      "name": "source-reference-id",
      "valueString": "Patient/2"
    },
    {
      "name": "target-reference-id",
      "valueString": "Patient/3"
    }
  ]
}

Output:

HTTP/1.1 200 OK

{
  "resourceType": "Parameters",
  "parameter": [
    {
      "name": "outcome",
      "resource": {
        "resourceType": "Bundle",
        "id": "782add05-549c-4a7e-a687-38c22f2f12d0",
        "type": "transaction-response",
        "entry": [
          {
            "response": {
              "status": "200 OK",
              "location": "CarePlan/62/_history/2",
              "etag": "2",
              "outcome": {
                "resourceType": "OperationOutcome",
                "issue": [
                  {
                    "severity": "information",
                    "code": "informational",
                    "details": {
                      "coding": [
                        {
                          "system": "https://hapifhir.io/fhir/CodeSystem/hapi-fhir-storage-response-code",
                          "code": "SUCCESSFUL_PATCH",
                          "display": "Patch succeeded."
                        }
                      ]
                    },
                    "diagnostics": "Successfully patched resource \"CarePlan/62/_history/2\"."
                  }
                ]
              }
            }
          },
          ... etc outcome of the rest of the patch operations ...
        ]
      }
    }
  }
}

6.18.1.2Deduplication Roadmap

The Deduplication features of Smile CDR are under active development. Here is a roadmap of new features we are planning to roll out.

6.18.1.2.1Feb 2025 Release

6.18.1.2.2May or Aug 2025 Release

  • Create new $undo-merge and $undo-replace-references operations that use the Provenance resources to undo the effects of those operations (assuming that none of the affected resources have subsequentially changed).

6.18.1.2.3Sometime after the May 2025 Release

  • Create a third MDM mode: MATCH_AND_MERGE which is similar to MATCH_ONLY in that it uses MDM Rules but does not create any links or Golden Resources. When enabled, a MATCH_AND_MERGE MDM module will automatically perform a $merge operation on all inbound resources. Any resources that MATCH a single target will be merged into that resources following the MDM Survivorship rules and all references will be updated to point to the merged resource.
  • Create a new $deduplicate operation that takes a FHIR Bundle resource as input and uses MDM Rules to remove duplicates from the inbound Bundle and update references to point to the matched resource. Think of this as a stronger version of Conditional Create where you now have the full power of MDM matching to find matching resources rather than being limited by the FHIR Conditional Create syntax.
  • Provide this new $deduplicate operation as a Camel Processor
  • Provide a new $submit-for-deduplication operation that works like $mdm-submit and performs MATCH_AND_MERGE on all resources that match the criteria in the request. E.g. If Organization resources with _source=ABC were accidentally duplicated in your FHIR Repository, you could call $submit-for-deduplication with the criteria Organization?_source=ABC to submit all of those Organizations for deduplication. Ones that have existing matches would be deleted and all references updated to point to the remaining copy of that organization.