FHIR Repositories can end up with duplicate resources in them. This page details tools that Smile provides to work with duplicated data.
Cause of Duplicates | Recommended Approach |
---|---|
The same entities were imported from different source systems. E.g. Jane Smith's Patient record is imported from the Lab system as Patient/123 and from the Pharmacy system as Patient/456. | Preserve the source data as is so that future data continues to be properly associated to the right patients. Use MDM to establish Golden Resources and MDM LINK records outside of the data and use MDM features like Observation?patient=Patient/123&_mdm=true to link the data at search time. |
Data was accidentally duplicated on import. E.g. the same data was accidentally loaded twice (as POST new resources) or Conditional Create directives failed to match the intended target correctly. This kind of unintended duplication can also occur when translating HL7v2, CDA, or CSV data into FHIR resources. | In the case where the data was accidentally duplicated, it may make sense to "clean up" the duplicates. See below for details on Smile CDR tools to merge such duplicates. |
Smile CDR provides several operations to deduplicate data:
$merge
which is a backport of the FHIR R5 Patient/$merge specification to FHIR R4$hapi.fhir.replace-references
performs only the update references part of this $merge
operation$hapi.fhir.undo-replace-references
undoes the effects of a $hapi.fhir.replace-references
operation$sdh.mdm-bundle-match
processes FHIR Bundles to match resources using MDM rules and optionally merge them using survivorshipSee the FHIR R5 Patient/$merge specification page for a description of this operation. See the bottom of this page for details on the current roadmap for enhancing Smile CDR deduplication functionality.
Name | Type | Default | Notes |
---|---|---|---|
source-patient-identifier | Identifier | List of source patient identifiers | |
source-patient | Reference | Source patient | |
target-patient-identifier | Identifier | List of target patient identifiers | |
target-patient | Reference | Target patient | |
result-patient | Patient | Optional merged patient resource | |
preview | Boolean | false | If true, no changes will be made and response will summarize what would happen were the merge to occur |
delete-source | Boolean | false | If true, delete the source resource |
resource-limit | Integer | 512 | If the request is synchrononous and the number of resources to change exceeds this threshold, the operation will fail with 412 Precondition Failed. This parameter has no effect if the Prefer: respond-async header is set |
See the FHIR R5 Patient/$merge specification for a detailed description of these input parameters.
resource-limit
is a Smile CDR addition to protect users from accidentally changing too many resources at once. If resource-limit
is larger than 10000, the value 10000 will be used.
If you request that the operation be performed asynchronously by providing the Prefer: respond-async
HTTP header, then the resource-limit
parameter is ignored.
When performed asynchronously, the operation is performed in batches of 1024 resource patches at a time, via PATCH transaction transaction Bundles.
Name | Type | Notes |
---|---|---|
input | Parameters | A copy of the input parameters used in the $merge operation |
outcome | OperationOutcome | Details about the result of the merge |
result | Patient | The merged Patient resource |
task | Task | If the merge operation was performed asynchronously, this Task resource provides details about the status of the merge operation |
With the 2025.08 release, the $merge
operation creates a Provenance resource upon successful completion.
This Provenance resource contains, in its target
element, the versioned references to the target patient,
the source patient (if not deleted during the operation), and all other resources updated as part of the operation.
The Provenance.activity
is set to http://terminology.hl7.org/CodeSystem/iso-21089-lifecycle|merge
, and the Provenance.agent.who
is populated with a logical reference to the request user.
Input:
POST /Patient/$merge
Content-Type: application/fhir+json
{
"resourceType": "Parameters",
"parameter": [
{
"name": "source-patient",
"valueReference": {
"reference": "Patient/2"
}
},
{
"name": "target-patient",
"valueReference": {
"reference": "Patient/3"
}
}
]
}
Output:
HTTP/1.1 200 OK
{
"resourceType": "Parameters",
"parameter": [
{
"name": "input",
"resource": {
"resourceType": "Parameters",
"parameter": [
{
"name": "source-patient",
"valueReference": {
"reference": "Patient/2"
}
},
{
"name": "target-patient",
"valueReference": {
"reference": "Patient/3"
}
}
]
}
},
{
"name": "outcome",
"resource": {
"resourceType": "OperationOutcome",
"issue": [
{
"severity": "information",
"details": {
"text": "Merge operation completed successfully."
}
}
]
}
},
{
"name": "result",
"resource": {
"resourceType": "Patient",
"id": "3",
"identifier": [
{
"system": "SYS2A",
"value": "VAL2A"
},
{
"system": "SYS2B",
"value": "VAL2B"
},
{
"system": "SYSC",
"value": "VALC"
},
{
"use": "old",
"system": "SYS1A",
"value": "VAL1A"
},
{
"use": "old",
"system": "SYS1B",
"value": "VAL1B"
}
],
"link": [
{
"other": {
"reference": "Patient/2"
},
"type": "replaces"
}
]
}
}
]
}
Input:
POST /Patient/$merge
Content-Type: application/fhir+json
{
"resourceType": "Parameters",
"parameter": [
{
"name": "source-patient",
"valueReference": {
"reference": "Patient/2"
}
},
{
"name": "target-patient",
"valueReference": {
"reference": "Patient/3"
}
},
{
"name": "preview",
"valueBoolean": "true"
}
]
}
Output:
HTTP/1.1 200 OK
{
"resourceType": "Parameters",
"parameter": [
{
"name": "input",
"resource": {
"resourceType": "Parameters",
...copy of input...
}
},
{
"name": "outcome",
"resource": {
"resourceType": "OperationOutcome",
"issue": [
{
"severity": "information",
"details": {
"text": "Preview only merge operation - no issues detected"
},
"diagnostics": "Merge would update 25 resources"
}
]
}
},
{
"name": "result",
"resource": {
"resourceType": "Patient",
...merged patient...
}
}
]
}
Input:
POST /Patient/$merge
Content-Type: application/fhir+json
Prefer: respond-async
{
"resourceType": "Parameters",
"parameter": [ {
"name": "source-patient-identifier",
"valueIdentifier": {
"system" : "urn:oid:1.2.36.146.595.217.0.1",
"value" : "12345"
}
},
{
"name": "target-patient-identifier",
"valueIdentifier": {
"system" : "urn:oid:1.2.36.146.595.217.0.1",
"value" : "12346"
}
}
]
}
Output:
HTTP/1.1 202 Accepted
{
"resourceType": "Parameters",
"parameter": [
{
"name": "input",
"resource": {
"resourceType": "Parameters",
...copy of input...
}
},
{
"name": "task",
"resource": {
"resourceType": "Task",
"id": "352",
"identifier": [
{
"system": "http://hapifhir.io/batch/jobId",
"value": "26738f4d-266c-4ef6-934f-1d13b1b474b9"
}
],
"status": "in-progress"
}
}
]
}
You can poll the status of the returned Task resource see when it completes. For more detailed status about the background job, you can view the status of the corresponding Smile CDR batch job either through the Web Admin Console, or through the Admin JSON API. The id of the Smile CDR batch job is provided as an identifer on the returned Task.
The $hapi.fhir.undo-merge
operation undoes the effects of the most recent $merge
operation on the given source and target ids. This operation uses the Provenance resource that was created by the $merge
operation, and restores all the resources that were updated as part of the operation back to their versions before the operation. This restore operation is done as an update, so it actually creates a newer version of each restored resource. The operation is performed as a transaction, so it either restores all or none.
The hapi.fhir.undo-merge
operation currently has the following limitations:
$merge
operation was performed.The $hapi.fhir.undo-merge
input parameters are a subset of the $merge
operation input parameters. They are used to identify the source and target resources from a merge operation that should be restored to their previous version.
Name | Type | Default | Notes |
---|---|---|---|
source-patient-identifier | Identifier | List of source patient identifiers | |
source-patient | Reference | Source patient | |
target-patient-identifier | Identifier | List of target patient identifiers | |
target-patient | Reference | Target patient |
Name | Type | Notes |
---|---|---|
outcome | OperationOutcome | Outcome of the operation |
Assuming that the $merge
operation was previously performed to merge Patient/2
to Patient/3
,
you can undo that operation with the following request:
Input:
POST Patient/$hapi.fhir.undo-replace-references
Content-Type: application/fhir+json
{
"resourceType": "Parameters",
"parameter": [
{
"name": "source-patient",
"valueString": "Patient/2"
},
{
"name": "target-patient",
"valueString": "Patient/3"
}
]
}
Output:
HTTP/1.1 200 OK
{
"resourceType": "Parameters",
"parameter": [
{
"name": "outcome",
"resource": {
"resourceType": "OperationOutcome",
"issue": [
{
"severity": "information",
"details": {
"text": "Successfully restored 8 resources to their previous versions based on the Provenance resource: Provenance/1475/_history/1"
}
}
]
}
}
]
}
The $hapi.fhir.replace-references
operation searches for all resources in the repository that have a reference to the
source
resource, and updates those references to point to the target
resource. It is a simplified form of the $merge
operation when all you want to do is update references.
This operation creates a Transaction Bundle of Patch operations to update the references and returns the output of performing that transaction.
Name | Type | Default | Notes |
---|---|---|---|
source-reference-id | String | The id of the source resource reference to be replaced | |
target-reference-id | String | The id of the target resource reference that the references will be replaced with | |
resource-limit | Integer | 512 | If the request is synchrononous and the number of resources to change exceeds this threshold, the operation will fail with 412 Precondition Failed. This parameter has no effect if the Prefer: respond-async header is set |
The resource-limit
parameter is available to control how many resources can be changed by this operation. If resource-limit
is larger than 10000, the value 10000 will be used.
If you request that the operation be performed asynchronously by providing the Prefer: respond-async
HTTP header, then the resource-limit
parameter is ignored.
When performed asynchronously, the operation is performed in batches of 1024 resource patches at a time, via PATCH transaction transaction Bundles.
Name | Type | Notes |
---|---|---|
outcome | Bundle | The result of the Bundle patch transaction |
task | Task | If the operation was performed asynchronously, this Task resource provides details about the status of the operation |
See the $merge
operation above for details about the returned Task resource in the case when the operation is performed asynchronously.
With the 2025.08 release, the $hapi.fhir.replace-references
operation creates a Provenance resource upon successful completion. This Provenance resource contains, in its target
element, the versioned references to the target resource,
the source resource, and the resources updated as part of the operation.
The Provenance.activity
is set to http://terminology.hl7.org/CodeSystem/iso-21089-lifecycle|link
, and the Provenance.agent.who
is populated with a logical reference to the request user. Note that a provenance resource for this operation is not created if no resources were actually updated because the source resource is not referenced by any resources.
Input:
POST /$hapi.fhir.replace-references
Content-Type: application/fhir+json
{
"resourceType": "Parameters",
"parameter": [
{
"name": "source-reference-id",
"valueString": "Patient/2"
},
{
"name": "target-reference-id",
"valueString": "Patient/3"
}
]
}
Output:
HTTP/1.1 200 OK
{
"resourceType": "Parameters",
"parameter": [
{
"name": "outcome",
"resource": {
"resourceType": "Bundle",
"id": "782add05-549c-4a7e-a687-38c22f2f12d0",
"type": "transaction-response",
"entry": [
{
"response": {
"status": "200 OK",
"location": "CarePlan/62/_history/2",
"etag": "2",
"outcome": {
"resourceType": "OperationOutcome",
"issue": [
{
"severity": "information",
"code": "informational",
"details": {
"coding": [
{
"system": "https://hapifhir.io/fhir/CodeSystem/hapi-fhir-storage-response-code",
"code": "SUCCESSFUL_PATCH",
"display": "Patch succeeded."
}
]
},
"diagnostics": "Successfully patched resource \"CarePlan/62/_history/2\"."
}
]
}
}
},
... etc outcome of the rest of the patch operations ...
]
}
}
}
}
The $hapi.fhir.undo-replace-references
operation undoes the effects of the most recent $hapi.fhir. replace-references
operation on the given source and target ids. This operation uses the Provenance resource that was created by the $hapi.fhir.replace-references
operation, and restores all the resources that were updated as part of the operation back to their versions before the operation. This restore operation is done as an update, so it actually creates a newer version of each restored resource. The operation is performed as a transaction,
so it either restores all or none.
The hapi.fhir.undo-replace-references
operation currently has the following limitations:
$hapi.fhir.replace-references
operation was performed.Name | Type | Default | Notes |
---|---|---|---|
source-reference-id | String | The id of the source resource, this must be the same as the source-reference-id that was used in the $hapi-fhir-replace-references operation being undone | |
target-reference-id | String | The id of the target resource, this must be the same as the target-reference-id that was used in the $hapi-fhir-replace-references operation being undone |
Name | Type | Notes |
---|---|---|
outcome | OperationOutcome | The outcome of the operation |
Assuming that the $hapi.fhir.replace-references
operation was previously performed to replace references from Patient/2
to Patient/3
,
you can undo that operation with the following request:
Input:
POST /$hapi.fhir.undo-replace-references
Content-Type: application/fhir+json
{
"resourceType": "Parameters",
"parameter": [
{
"name": "source-reference-id",
"valueString": "Patient/2"
},
{
"name": "target-reference-id",
"valueString": "Patient/3"
}
]
}
Output:
HTTP/1.1 200 OK
{
"resourceType": "Parameters",
"parameter": [
{
"name": "outcome",
"resource": {
"resourceType": "OperationOutcome",
"issue": [
{
"severity": "information",
"diagnostics": "Successfully restored 8 resources to their previous versions based on the Provenance resource: Provenance/1234/_history/1"
}
]
}
}
]
}
The $sdh.mdm-bundle-match
operation processes FHIR Bundles to match resources against existing resources in the repository using Master Data Management (MDM) rules. This operation helps prevent duplicate resources by identifying resources in an input Bundle that match existing resources in the repository.
The operation can operate in two modes:
merge=false
, default): Removes matched resources from the bundle and updates referencesmerge=true
): Applies survivorship rules to merge bundle resources with repository resourcesName | Type | Default | Notes |
---|---|---|---|
bundle | Bundle | The input Bundle containing resources to process for MDM matching | |
merge | Boolean | false | When true, matched resources are merged using survivorship rules instead of being removed. The merged resources become UPDATE operations. |
Name | Type | Notes |
---|---|---|
bundle | Bundle | The processed Bundle where matched resources are either removed (merge=false) or updated (merge=true) |
When merge=false
:
When merge=true
:
If-Match
header for optimistic lockingMATCH_AND_LINK
or MATCH_ONLY
modemerge=true
), survivorship scripts should be configuredResourceVersionConflictException
is thrownInput Bundle contains a Patient that matches an existing Patient in the repository:
Input:
POST /Bundle/$sdh.mdm-bundle-match
Content-Type: application/fhir+json
{
"resourceType": "Parameters",
"parameter": [
{
"name": "bundle",
"resource": {
"resourceType": "Bundle",
"type": "transaction",
"entry": [
{
"request": {
"method": "POST",
"url": "Patient"
},
"resource": {
"resourceType": "Patient",
"identifier": [
{
"system": "http://example.org/mrn",
"value": "12345"
}
],
"name": [
{
"family": "Smith",
"given": ["John"]
}
]
}
},
{
"request": {
"method": "POST",
"url": "Observation"
},
"resource": {
"resourceType": "Observation",
"status": "final",
"code": {
"coding": [
{
"system": "http://loinc.org",
"code": "8302-2"
}
]
},
"subject": {
"reference": "urn:uuid:patient-temp-id"
}
}
}
]
}
}
]
}
Output (Patient removed, Observation reference updated):
HTTP/1.1 200 OK
{
"resourceType": "Parameters",
"parameter": [
{
"name": "bundle",
"resource": {
"resourceType": "Bundle",
"type": "transaction",
"entry": [
{
"request": {
"method": "POST",
"url": "Observation"
},
"resource": {
"resourceType": "Observation",
"status": "final",
"code": {
"coding": [
{
"system": "http://loinc.org",
"code": "8302-2"
}
]
},
"subject": {
"reference": "Patient/456"
}
}
}
]
}
}
]
}
Same input as Example 1, but with merge=true
:
Input:
POST /Bundle/$sdh.mdm-bundle-match
Content-Type: application/fhir+json
{
"resourceType": "Parameters",
"parameter": [
{
"name": "bundle",
"resource": {
...same bundle as Example 1...
}
},
{
"name": "merge",
"valueBoolean": true
}
]
}
Output (Patient becomes UPDATE operation, references updated):
HTTP/1.1 200 OK
{
"resourceType": "Parameters",
"parameter": [
{
"name": "bundle",
"resource": {
"resourceType": "Bundle",
"type": "transaction",
"entry": [
{
"request": {
"method": "PUT",
"url": "Patient/456",
"ifMatch": "W/\"2\""
},
"resource": {
"resourceType": "Patient",
"id": "456",
"identifier": [
{
"system": "http://example.org/mrn",
"value": "12345"
}
],
"name": [
{
"family": "Smith",
"given": ["John", "J"]
}
]
}
},
{
"request": {
"method": "POST",
"url": "Observation"
},
"resource": {
"resourceType": "Observation",
"status": "final",
"code": {
"coding": [
{
"system": "http://loinc.org",
"code": "8302-2"
}
]
},
"subject": {
"reference": "Patient/456"
}
}
}
]
}
}
]
}
Note how in survivorship mode, the Patient resource becomes a PUT operation that merges data (e.g., additional given name "J") while maintaining the repository resource ID.
The Deduplication features of Smile CDR are under active development. Here is a roadmap of new features we are planning to roll out.
$merge
and $hapi.fhir.replace-references
$hapi.fhir.undo-replace-references
operation that uses the Provenance resources to undo
the effects of $hapi.fhir.replace-references
operations (assuming that none of the affected resources have subsequently changed)$sdh.mdm-bundle-match
operation with merge
parameter to support survivorship-based resource merging$undo-merge
operation that uses the Provenance resources to undo the effects of $merge
operations$merge
to all Resource typesMATCH_AND_MERGE
which is similar to MATCH_ONLY
in that it uses MDM Rules but does not create any links or Golden Resources. When enabled, a MATCH_AND_MERGE
MDM module will automatically perform a $merge
operation on all inbound resources. Any resources that MATCH a single target will be merged into that resources following the MDM Survivorship rules and all references will be updated to point to the merged resource.$deduplicate
operation that takes a FHIR Bundle resource as input and uses MDM Rules to remove duplicates from the inbound Bundle and update references to point to the matched resource. Think of this as a stronger version of Conditional Create where you now have the full power of MDM matching to find matching resources rather than being limited by the FHIR Conditional Create syntax.$deduplicate
operation as a Camel Processor$submit-for-deduplication
operation that works like $mdm-submit
and performs MATCH_AND_MERGE on all resources that match the criteria in the request. E.g. If Organization resources with _source=ABC
were accidentally duplicated in your FHIR Repository, you could call $submit-for-deduplication
with the criteria Organization?_source=ABC
to submit all of those Organizations for deduplication. Ones that have existing matches would be deleted and all references updated to point to the remaining copy of that organization.